The Real Python Podcast

Scaling Data Science and Machine Learning Infrastructure Like Netflix


Listen Later

Would you move your data science project from a laptop to the cloud? Would you also like to have snapshots of your project saved along the way so that you can go back in time or share the state of your project with another team member? This week on the show, we have Savin Goyal from Netflix. Savin is the technical lead for machine learning infrastructure at Netflix. He joins us to talk about Metaflow, an open-source tool to simplify building, managing, and scaling data science projects.

Metaflow addresses the needs of the numerous data scientists who work at Netflix. Machine learning is key strength for the streaming service. They tried several existing tools to scale their own internal infrastructure and after this experimentation developed Metaflow.

We talk about the history of the project and how someone could get started with the open-source version. Savin also contrasts the cost of infrastructure as compared to data scientists and the cost of their time.

Course Spotlight: Simplify Python GUI Development With PySimpleGUI

In this step-by-step course, you’ll learn how to create a cross-platform graphical user interface (GUI) using Python and PySimpleGUI. A graphical user interface is an application that has buttons, windows, and lots of other elements that the user can use to interact with your application.

Topics:

  • 00:00:00 – Introduction
  • 00:01:53 – What is Metaflow?
  • 00:04:15 – Savin’s background in data science and infrastructure
  • 00:06:06 – Democratization of infrastructure and iteration of tools
  • 00:10:34 – What information is saved about the infrastructure requirements for a project?
  • 00:17:17 – How are the requirements annotated?
  • 00:18:39 – Sponsor: Digital Ocean’s App Platform
  • 00:19:15 – How do project snapshots work?
  • 00:29:33 – Cost of infrastructure vs data scientists
  • 00:32:28 – Working with data at Netflix scale
  • 00:37:55 – Video Course Spotlight
  • 00:39:06 – Getting an organization to use new tools and then making open-source
  • 00:49:51 – Documentation of Metaflow and getting started on solving infrastructure problems
  • 00:53:57 – What made you interested in working on infrastructure tools?
  • 00:55:13 – What is something you are excited about in the world of Python?
  • 00:56:18 – What do you want to learn next?
  • 00:58:14 – Thanks and goodbye
  • Show Links:

    • Metaflow: A framework for real-life data science
    • Metaflow: Tutorials
    • More Data Science, Less Engineering with Netflix’s Metaflow By Savin Goyal - YouTube
    • R: The R Project for Statistical Computing
    • Tidyverse: R packages for data science
    • Anything you can do, I can do (kinda). Tidyverse pipes in Pandas
    • reticulate: R Interface to Python
    • Apache Airflow: Programmatically author, schedule and monitor workflows
    • Directed acyclic graph (DAG) - Wikipedia article
    • Serializing Objects With the Python pickle Module - Real Python Course
    • Level up your Python skills with our expert-led courses:

      • Learn Text Classification With Python and Keras
      • Using Jupyter Notebooks
      • Simplify Python GUI Development With PySimpleGUI
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        139 ratings


        More shows like The Real Python Podcast

        View all
        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        288 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        625 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        579 Listeners

        Soft Skills Engineering by Jamison Dance and Dave Smith

        Soft Skills Engineering

        289 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        302 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        988 Listeners

        Darknet Diaries by Jack Rhysider

        Darknet Diaries

        8,088 Listeners

        Tech Brew Ride Home by Morning Brew

        Tech Brew Ride Home

        969 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        200 Listeners

        AWS Podcast by Amazon Web Services

        AWS Podcast

        207 Listeners

        Django Chat by William Vincent and Carlton Gibson

        Django Chat

        75 Listeners

        Last Week in AI by Skynet Today

        Last Week in AI

        310 Listeners

        Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

        Machine Learning Street Talk (MLST)

        100 Listeners

        The Pragmatic Engineer by Gergely Orosz

        The Pragmatic Engineer

        70 Listeners