The Real Python Podcast

Focusing on Data Science & Less on Engineering and Dependencies


Listen Later

How do you manage the dependencies of a large-scale data science project? How do you migrate that project from a laptop to cloud infrastructure or utilize GPUs and multiple instances in parallel? This week on the show, Savin Goyal returns to discuss the updates to the open-source framework Metaflow.

Savin briefly describes the Metaflow platform and the goal of simplifying engineering overhead for data scientists and programmers. We discuss how the platform captures snapshots of a project as you work, allowing you to go back in time or share the state of your project with another team member.

We dig into the complicated process of managing dependencies for machine learning and data science projects. Savin describes how the required external libraries can be specified within a flow with the new @pypi or @conda decorators. This allows a project to scale from a local machine to the cloud or multiple instances with all dependencies included.

He talks about starting a new company, Outerbounds, with fellow co-workers from Netflix. Their vision is to continue to build the Metaflow open-source platform and offer customers scalable enterprise-grade infrastructure.

This week’s episode is brought to you by Intel.

Course Spotlight: Everyday Project Packaging With pyproject.toml

In this Code Conversation video course, you’ll learn how to package your everyday projects with pyproject.toml. Playing on the same team as the import system means you can call your project from anywhere, ensure consistent imports, and have one file that’ll work for many build systems.

Topics:

  • 00:00:00 – Introduction
  • 00:02:25 – Update on Metaflow
  • 00:04:13 – What is Outerbounds?
  • 00:07:26 – An ML platform to serve data scientists needs
  • 00:13:02 – Dependency reproducibility via @conda and @pypi decorators
  • 00:26:18 – Sponsor: Intel
  • 00:27:10 – Storing lock files along with snapshots
  • 00:29:17 – Working alongside code and dependency management systems
  • 00:34:03 – Scaling a project from laptop to the cloud
  • 00:40:13 – Video Course Spotlight
  • 00:41:41 – Getting visibility on processes
  • 00:47:23 – Adjusting your project due to GPU availability
  • 00:52:27 – Example of jumping back into a project one year later
  • 00:55:54 – What are you excited about in the world of Python?
  • 00:57:39 – What do you want to learn next?
  • 00:59:35 – How can people follow your work online?
  • 01:00:19 – Thanks and goodbye
  • Show Links:

    • Metaflow - a framework for real-life ML, AI, and data science
    • Infrastructure for ML, AI, and Data Science - Outerbounds
    • Human-Friendly, Production-Ready Data Science with Metaflow- Savin Goyal | SciPy 2022 - YouTube
    • Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix – The Real Python Podcast
    • New in Metaflow: The Long-Awaited @pypi Decorator - Outerbounds
    • Managing Dependencies - Metaflow Docs
    • Secure ML with Secure Software Dependencies - Outerbounds
    • Directed acyclic graph (DAG) - Wikipedia article
    • Visualizing Results - Metaflow Docs
    • Seamless Data and ML Pipelines with Airflow and Metaflow - Outerbounds
    • Episode #142: Orchestrating Large and Small Projects With Apache Airflow – The Real Python Podcast
    • Savin (@SavinGoyal) - X
    • Savin Goyal - LinkedIn
    • Building the ML-driven future - Outerbounds Blog
    • Level up your Python skills with our expert-led courses:

      • Everyday Project Packaging With pyproject.toml
      • Combining Data in pandas With concat() and merge()
      • Histogram Plotting in Python: NumPy, Matplotlib, Pandas & Seaborn
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        136 ratings


        More shows like The Real Python Podcast

        View all
        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        272 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        283 Listeners

        Data Skeptic by Kyle Polich

        Data Skeptic

        481 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        592 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        624 Listeners

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

        443 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        296 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        142 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        982 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        189 Listeners

        DataFramed by DataCamp

        DataFramed

        266 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        189 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        64 Listeners

        Latent Space: The AI Engineer Podcast by swyx + Alessio

        Latent Space: The AI Engineer Podcast

        77 Listeners