Exploratory Data Analysis(Part 1)

Exploratory Data Analysis is part of machine learning. It allows a company to perform investigations on data and discover patterns.

With the use of summary statistics and graphical representations, exploratory data analysis refers to the crucial process of doing first investigations on data in order to uncover patterns, spot anomalies, test hypotheses, and check assumptions. This process is usually done by data scientists.

It is a good idea to first understand the data and then strive to extract as many insights as possible. Before getting their hands dirty with data, EDA is all about making sense of it.

Because this skill is the key to avoiding wild goose chases, it is one of the most crucial (though often underestimated) skills.
In data science, there’s a huge problem known as “Tactical Hell.” This is a word used by startups to describe when there are too many approaches / tactics to pick from.

Training a machine learning model is similar to growing a business in many respects. You also have an excessive number of “tactics” to choose from:
Is it necessary to wipe your data more frequently? Collect more information? More features to develop? Do you want to try out some new algorithms?

Going in blind could mean disaster for your entire project because there’s a lot of trial and error involved. So, how do you avoid ending up at a dead end? “Exploratory Analysis” is the answer. (This is just a fancy way of saying “get to know” your data.)
Consider the following scenario: You’re a commander with a finite amount of resources (i.e. time and data). Exploratory Analysis is akin to dispatching scouts to determine the optimal location for your soldiers to be deployed.
Making this decision up front will make the rest of the project go much more smoothly.

In the Next Blog i’ll share a few steps on how this is done. As always keep learning.

Leave a comment