How Netflix Uses Data

Netflix has come incredibly far since its humble start as a mail order DVD rental company in 1997 (and still rents DVDs!). Few companies have adapted and changed as quickly and gracefully as Netflix.

Their secret? Data science.

Netflix began experimenting with data in 2006 when they held a competition to create an algorithm that would “substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.” Since then, Netflix has taken data beyond rating prediction and into personalized ranking, page generation, search, image selection, messaging, marketing, and more.

What Is The Netflix Recommendation Engine?

Their most successful algorithm, Netflix Recommendation Engine (NRE), is made up of algorithms that filter content based on each individual user profile. The engine filters over 3,000 titles at a time using 1,300 recommendation clusters based on user preferences. It’s so accurate that 80% of Netflix viewer activity is driven by personalised recommendations from the engine. It’s estimated that the NRE saves Netflix over $1 billion per year in customer acquisition as of 2016.

It’s so accurate that 80% of Netflix viewer activity is driven by personalised recommendations.

How Netflix’s Algorithm Works

There are a number of factors that affect the personalized recommendation system. These factors include:

  • Individual viewing history
  • The user ratings for other titles watched
  • Other Netflix users with similar interests and viewing history
  • Information about the titles watched such as the genre, cast, and release date

The Netflix algorithm uses machine learning systems such as reinforcement learning, matrix factorization, and other algorithmic approaches to rank and organize titles.

Categories are arranged in horizontal rows with the most strongly recommended rows located at the top which usually includes a “Top 10” and “For You” category. The titles are then presented to individual users within different categories.

The two-tiered recommendation system, the rows and the titles within each row, is an intuitive design that helps users find a particular title and also helps Netflix gain information about the user as they scroll through the interface. As Netflix uses data points to continuously update the rows with recommended options, it ensures that customers can find their next binge-worthy title as quickly as possible.

The Cold Start Problem

Netflix also uses its recommendation system to cut through the cold start problem. When a new subscriber joins Netflix, the recommendation system does not have access to any previous data as none exists.

To solve this problem, Netflix asks a series of questions through an initial survey to help determine a user's tastes and preferences. Then, Netflix recommends titles based on users with similar tastes.

If the new subscriber skips the initial survey, Netflix will recommend a diverse and popular set of shows and movies. Once the user starts engaging with the content, Netflix will begin to deliver a more personalized experience with each show and movie the viewer watches.

Other Popular Recommendation Engines

Netflix isn’t the only company using a recommendation engine. Amazon, LinkedIn, Spotify, Instagram, Youtube, and many other web platforms all use recommendation engines to predict their users’ preferences and boost their business.

But Netflix clearly has the most successful engine. 47% of North Americans prefer to use Netflix with a 93% retention rate. Amazon Prime comes in second at only 14% and every other subscription streaming service lingers in the single digits.

The Evolution of Netflix’s Algorithm

When Netflix launched its streaming service in 2007, it gained access to significantly more viewer data than it had from its prior business model of mailing out DVDs

With the change in business came a change in the way Netflix made personalized recommendations. The outcome of the Progress Prize competition in 2007, the annual successor of the 2006 competition, was improved algorithm models which Netflix built its recommendation system on. The winning algorithm had an 8% improvement in the RMSE (the benchmark used in the competition) over the existing Cinematch algorithm used by Netflix prior to 2007.

Netflix’s success has been clear since its inception — it currently stands as the most popular subscription streaming service and its recommendation engine has played a large role in that success.

Netflix tracks data points like:

  • Time and date a Netflix user watched a title
  • User profile information such as age, gender, location, and selected favorite content upon sign up
  • The device used to stream
  • If the show was paused, rewound, or fast-forwarded
  • If the viewer resumed watching after pausing
  • Whether an entire TV series or movie was completed
  • How long it took a viewer to watch an entire TV series
  • Whether the viewer gave the show or movie a thumbs up
  • Scenes users have viewed repeatedly
  • The number of searches and what is searched for
  • Where a user watched the show (by postal code)
  • Browsing and scrolling behavior
  • Screen shots when the show was paused, when the user left the show, and when the user watches a scene more than once

These data points are aggregated and processed by Netflix’s algorithms to continuously improve its recommendations to serve the best content for each user’s personal tastes.

As streaming services continue to grow in popularity, it will be interesting to see how Netflix and other subscription streaming services continue to use machine learning and artificial intelligence over the coming years to take advantage of all the data points being collected.

Interested in becoming a Data Scientist?


Original Content and Marketing

Another area where Netflix uses data is in determining the viability of its original content. The streaming giant’s original content is successful 93% of the time.

The typical television show has only a 35% chance of succeeding. Netflix’s choices about greenlighting original content aren’t random. They’re based on data too – unlike television which relies on tradition, opinion, and sometimes luck. In order to ensure that original content will be successful, Netflix uses aggregated user data to predict trends and create new content accordingly.

Netflix also uses data to create targeted marketing campaigns for that original content. They cut over ten different versions of trailers for content that they expect to be popular.

Take House of Cards, for example. If your user profile indicated you liked “strong female leads,” you would see the previews featuring Robin Wright who played Claire Underwood. They created trailers focusing on the director, Kevin Spacey and his character Francis Underwood, the political aspects, and more. Each one chosen by an algorithm to show you with a nearly 90% guarantee you would enjoy it or at least be interested in watching the first episode.

Download the Curriculum Package for our Data Science Bootcamp


Data Science and Research at Netflix

Netflix has gone beyond using data analytics to boost their business and have developed an entire research department which is integrated into their business and engineering teams. They’ve released open-source machine learning algorithms and Python frameworks aimed at boosting the productivity of Data Scientists and businesses.

From the Recommendations Engine to choosing which original shows and movies to make, Netflix knows exactly how to capture their audience and continue growing because they have harnessed the large amounts of data at their disposal. As a customer-centric business, they effectively use data science to market their service as a personalized streaming experience for each unique individual. They’re more than a streaming company, they’re also a data giant within the entertainment industry.

Sign up for our newsletter to learn more about the role data plays in the world around us