The power of data is perhaps best captured in Moneyball. The book and subsequent film tell the story of Billy Beane, the legendary general manager of the Oakland Athletics who used statistical analysis to reinvent the game of baseball. With one of the league’s smallest budgets, Beane relied on data to predict how many runs a player would score and then built a roster of undervalued—but talented—players to compete against rivals with deeper pockets.

Spoiler alert: it worked. A little while after they adopted this approach, the Oakland A’s started to win big. They even became the first team in over 100 years of American League baseball to win 20 consecutive games.

Want to become a data scientist or data analyst yourself? Our 12-week Data Science Bootcamp can prepare you for a new career.

Fast forward several years, and you’d be hard-pressed to find an industry that isn’t applying Moneyball-like strategies to make smarter decisions. Telecommunications companies are tracking calls to refine their customers’ experience. Health care experts are using it to develop a deeper understanding of patients and ultimately improve outcomes. Media service providers are turning to data to not only personalize content but also produce entirely new shows for viewers. The cases of industries using data are virtually limitless.

While organizations have more data than ever, people who have the skills to put it to good use are rare. The outcomes? Lost revenue, dissatisfied customers, disengaged employees—to name a few.

The good news is that there’s a straightforward five-step process that can be followed to extract insights from data, identify new opportunities, and drive growth. And better yet, the ability to do so isn’t limited to data scientists or math geniuses. People across all disciplines and at all stages of their careers can develop the skills to analyze data. It’s useful whether one is looking to upskill in a career or move into an entirely new industry.

image

Knowing what you don’t know is half the battle. Take our quiz for a personalized recommendation for what you’ll need for a career in data science.

Data analysis follows a detailed step-by-step process. In this post, we’ll walk you through this process to help you start a potential career in data science.

Jump to section:

  1. Ask The Right Questions
  2. Data Collection
  3. Data Cleaning
  4. Analyzing The Data
  5. Interpreting The Results

Step One: Ask The Right Questions

So you’re ready to get started. With no time to waste in discovering what makes your customers or employees tick, you quickly set out to collect as much data as you can get your hands on by digging through records and surveys. The more the better, right?

….Not so fast.

Before you start collecting data, you need to first understand what you want to do with it. Take some time to think about a specific business problem you want to address or consider a hypothesis that could be solved with data. From there, you’ll create a set of measurable, clear, and concise questions that will help answer that.

For example, an advertiser who wants to boost their client’s sales may ask if customers are likely to purchase from them after seeing an ad. Or an HR director who wants to reduce turnover might want to know why their top employees are leaving their company.

Starting with a clear objective is an essential step in the data analysis process. By recognizing the business problem that you want to solve and setting well-defined goals, it’ll be way easier to decide on the data you need.

Get the latest insights on data analysis delivered straight to your inbox


Step Two: Data Collection

This brings us to the next step: data collection. Now that you have a solid idea of your questions, it’s time to define what tye pf data you need to find those answers. There are three types of data:

First Party Data

First party, also known as 1P, data is data that a company collects directly from customers. This data source improves your ability to engage with your customers. It also allows you to develop a data strategy to ensure that you are catering to your customer’s interests.

Examples: Customer surveys, purchase information, customer interviews, and in-store interactions.

Second Party Data

Second party data is first-party data given to you from a trusted partner or company. The additional benefit of this data set is that it can help you uncover more insights about your customers. This can help your company uncover budding trends and forecast future growth.

Examples: Social media activity, app activity or website interactions.

Third Party Data

Third party data is any data collected by an organization or entity that doesn’t have a direct relationship with the individual the data is being collected from.

This data consists of unstructured, semistructured or structured data points known as big data. Big data is analysed using machine learning and predictive analytics to build industry reports.

Examples: Open data repositories or government resources.

Whatever type of data you use, the end goal of this step is to make sure to have a complete, 360-degree view of the problem you want to solve.

Step Three: Data Cleaning

You’ve collected and combined data from multiple sources. Great. But it’s not yet time to roll up your sleeves and dive into it. Here’s where you’ll spend some time polishing the data to ensure it’s in tip-top shape.

Data cleaning converts raw data into data that is suitable for analysis. This process involves removing incorrect data and checking for incompleteness or inconsistencies. Data cleaning is a vital step in the data analysis process because the accuracy of your analysis will depend on the quality of your data.

For instance, you might clean spaces in front of letters or symbols or remove duplicates. While this may seem minor, these issues can actually be quite pernicious: even the tiniest inaccuracies can skew your results. That’s because raw data is seldom usable in its current form. You’ll often find flaws within it, like missing values.

Step Four: Analyzing The Data

You now have a wealth of data. You’ve spent time cleaning it up. It’s as organized as it’ll ever be. Now you’re ready for the fun stuff.

In this step, you’ll begin to slice and dice your data to extract meaningful insights. There are many different data analysis techniques and processes that you can use. Let's explore the steps in a standard data analysis.

Data Analysis Steps & Techniques

1. Exploratory Analysis

Exploratory data analysis seeks to uncover insights about your data before the analysis begins. This method of analysis will save you a lot of time as it will determine if your data is appropriate for the given problem. There are five goals of exploratory data analysis: 1. Uncover and resolve data quality issues such as missing data 2. Uncover high-level insights about your data set 3. Detect anomalies in your data set 4. Understand existing patterns and correlations between variables 5. Create new variables using your business knowledge

Tools and Software: Python, R and Excel

2. Descriptive Analysis

Descriptive analysis seeks to answer the question, “What happened?”. This method will identify what is doing well and what is in need of improvement. It also lays the foundation for more advanced data analysis processes.

Example: You own a clothing store that sells products that range from t-shirts to winter jackets. A descriptive analysis will tell you which products are your best and worst sellers.

Tools and Software: SQL and DAX

3. Diagnostic Analysis

Diagnostic analysis seeks to answer the question, “Why did this happen?”. This method of analysis is the most abstract and involves detecting correlations between different variables.

Example: Your clothing store saw a decrease in revenue for t-shirt sales. A diagnostic analysis will look at the relationship between variables such as seasonality, the location of the t-shirts within the store and social media engagement with t-shirt revenue to determine which one has the strongest correlation. In this case, you determined that seasonality had the biggest impact.

Tools and Software: R, Python, Orange, and Weka

4. Predictive Analysis

Predictive analysis seeks to answer the question, “Will this happen again?”. This method of analysis determines what is going to happen in the future based on past data gathered.

Example: Your clothing store knows that t-shirt revenue will decrease in the winter months, but by how much? Predictive analysis will use your store’s historical data to create future revenue projections. This will give you an estimation of what your t-shirt revenue will be in the winter months.

Tools and Software: R and Python

5. Prescriptive Analysis

Prescriptive analysis seeks to answer the question, “What should we do?”. This method of analysis determines the best course of action based on previous analyses. The result is that you are able to take action according to future trends.

Example: Your clothing store is predicted to sell 50 t-shirts in December but you only have 40 t-shirts in your inventory. A prescriptive analysis will determine that you should order 15 more t-shirts. This will meet the predicted demand and create a buffer should the actual demand be higher

Tools and Software: R and Python

Interested in becoming a data analyst? Start your journey with our 12 week data science bootcamp.

Step Five: Interpreting The Results

After you’ve interpreted the results and drawn meaningful insights from them, the next step is to create data visualizations. Data visualization involves using several tools. Let's explore two popular tools that most data analysts use.

Popular Tools For Data Visualization

Tableau

Tableau is a valuable tool that more than 220,000 data analysts use. It allows you to convert text or numerical information into an interactive visual dashboard. It also uses an API to deploy any machine learning models that you have developed.

Microsoft Power BI

Microsoft Power BI is another great tool for creating data visualizations. This software has features such as data warehousing, data discovery, and a cloud-based interface. This allows you to easily build visual dashboards.

But pretty visualizations aren’t all that are needed here. If you want your findings to be implemented, you need to be able to present them to decision-makers and stakeholders in a manner that’s compelling and easy to comprehend. The best way to do this is through what’s called data storytelling, which involves turning your data into a compelling narrative. The goal of data storytelling is to propose a solution using appropriate business metrics that are directly related to your company’s key performance indicators.

A Whole New Ball Game

Thanks to computers and the internet, we live in a world that’s flooded with data. Nowadays, the ability to analyze it isn’t limited to data scientists. With the right training, anyone can think like a data analyst and find the answers they need to tackle some of their greatest business problems.

There’s no better time to learn this skill. As data continues to transform the way countless industries operate, there’s been a huge increase in demand for people who have the analytical chops to make the most of it. Whether you’re in advertising, retail, healthcare, and more, by learning these five stages of data analysis, you, too, can knock it out of the park.

Begin your own data analysis with our free online Python course.