A ready-to-run tutorial on how to build a structured dataset from a text.

Image by Gerd Altmann from Pixabay

In this tutorial, I illustrate how to build a dataset from the text. As an example I consider a birth register, which contains the following text:

On August 21 1826 a son was born to John Bon and named him Francis.
On June 11 1813 a daughter was born to James Donne naming her Mary Sarah.
On January 1 1832 a son was born to his father David Borne and named him John.


Learn math from these Python packages

Photo by Jeswin Thomas on Unsplash

As data scientists, we are constantly told that we need to understand machine learning because it is one of the tools that lets us do our job. I understand that many newbies in the field are learning machine learning without a deeper understanding of the concept and the equation — only relying on using the algorithm.

The most important base of understanding machine learning is math knowledge. When you hear math, it will inevitably remind you of high school lessons — hard, confusing, and theoretical. Machine learning math surely is similar, but in this modern era, it’s different from the…


Get more confident and save time in interviews by learning these common coding interview patterns in Python

Lesson 1: Use List Comprehension!

When coding in real life, I sometimes forget syntax and need to resort to Google. Sadly, this luxury isn’t available during coding interviews. To address this, I’ve been reviewing common syntax patterns in Python for coding interviews. Syntax isn’t as important as understanding core algorithms and data structure concepts in the first place, but for me, reviewing syntax instills confidence in my code and saves me invaluable time. I hope it does the same for you.

Part 1: Lists

1. Sorting

  • sorted(numbers) will return the sorted numbers in ascending order and leave the original numbers unchanged. You can also use numbers.sort()


A quick guide on the differences between quantitative and qualitative data.

There are a lot of engineers who have never been involved in the field of statistics or data science. But to build data science pipelines or rewrite produced code by data scientists to an adequate, easily maintained code many nuances and misunderstandings arise from the engineering side. For those Data/ML engineers and novice data scientists, I make this series of posts. I’ll try to explain some basic approaches in plain English and, based on it, explain some of the Data Science basic concepts.

The whole series:

Defining the type of variable you are working with is always the first step…


WeightWatcher is based on theoretical research (done in joint with UC Berkeley) into Why Deep Learning Works, based on our Theory of Heavy-Tailed Self-Regularization (HT-SR). It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.

Are your models over-trained? The weightwatcher tool can detect the signatures of overtraining in specific layers of pre/trained Deep Neural Networks.

In the Figure above, fig (a) is well trained, whereas fig (b) may be over-trained. That orange spike on the far right is the tell-tale clue; it’s what we call a Correlation Trap.

Weightwatcher can detect the signatures of overtraining…


How machine learning allows us to overcome impossible challenges, but still takes a lot of time and money

Introduction

“I thought AlphaGo was based on probability calculations, and it was merely a machine. But when I saw this movie, I changed my mind. Surely AlphaGo is creative. The move was really creative and beautiful” (Alpha Go documentary, 52:10–52:40).

In this quote, Lee Sedol, the greatest player to ever touch the game of Go, reacted to the infamous move 37 in one of his games against the reinforcement learning agent AlphaGo.

This highlights the kind of magical aura that surrounds machine learning and especially deep learning. …


Let’s see how we can access Google Sheets from Python without using strange integrations

Photo by Scott Graham on Unsplash

Google Sheets is a very powerful (and free) tool for creating spreadsheets. I’ve almost replaced LibreOffice Calc with Sheets because it’s very comfortable to work with. Sometimes, a data scientist has to pull some data from a Google Sheet into a Python notebook. In this article, I’ll show you how to do it using just Pandas.

The first thing to do is to create a Google Sheet. For this example, it will contain just 2 columns, one of which (the Age) has one missing value.

This is the dataset we’re going to work with.

Now we have to make it…


These Jupyter Notebook Extensions make Data Scientist life easier

Image by Bessi from Pixabay

Every Data Scientist spends most of his time in data visualization, preprocessing and model tuning based on the results. These are the toughest situations for every Data Scientist because you will get a good model when you perform all these three steps precisely. There are 10 very helpful jupyter notebook extensions to help in these circumstances.

1. Qgrid

Qgrid is a Jupyter notebook widget that uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting and filtering controls, as well as edit your DataFrames by double-clicking cells.


If you work in a scientific field, you should try to build a deep and unbiased understanding of that field. This not only educates you in the best possible way but also helps you envision the opportunities in your space.

A research paper is often the culmination of a wide range of deep and authentic practices surrounding a topic. When writing a research paper, the author thinks critically about the problem, performs rigorous research, evaluates their processes and sources, organizes their thoughts, and then writes. These genuinely-executed practices make for a good research paper.

If you’re struggling to build a…


Data structures and algorithms are some of the most essential topics for programmers, both to get a job and to do well on a job. Good knowledge of data structures and algorithms is the foundation of writing good code.

If you are familiar with essential data structures e.g. array, string, linked list, tree, map, and advanced data structures like tries, and self-balanced trees like AVL trees, etc., you’ll know when to use which data structure and compute the CPU and memory cost of your code.

Even though you don’t need to write your own array, linked list, or hashtable, given…

Muppala Sunny Chowdhary

Developer, Data Analyst & Trying to explore the Best Version of Myself ❤

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store