The Real Python Podcast

Managing Large Python Data Science Projects With Dask


Listen Later

What do you do when your data science project doesn’t fit within your computer’s memory? One solution is to distribute it across multiple worker machines. This week on the show, Guido Imperiale from Coiled talks about Dask and managing large data science projects through distributed computing.

We talk about projects where an orchestration system like Dask will help. Dask is designed to take advantage of parallel computing, spreading the work and data across multiple machines. Many familiar techniques for working with pandas and NumPy data are supported with Dask equivalents.

We also discuss the differences between managed and unmanaged memory. Guido shares advice on how to tackle memory issues while working with Dask.

This week we also talk briefly with Jodie Burchell, who will be a guest host on upcoming episodes. As a data scientist, Jodie will be bringing new topics, projects, and discussions to the show.

Course Spotlight: Exploring Scopes and Closures in Python

In this Code Conversation video course, you’ll take a deep dive into how scopes and closures work in Python. To do this, you’ll use a debugger to walk through some sample code, and then you’ll take a peek under the hood to see how Python holds variables internally.

Topics:

  • 00:00:00 – Introduction
  • 00:01:56 – Guido at PyCon DE 2022
  • 00:02:14 – Working on Dask for Coiled
  • 00:03:27 – Dask project history
  • 00:04:00 – How would someone start to use Dask?
  • 00:10:28 – Managing distributed data
  • 00:11:18 – Data files CSV vs Parquet
  • 00:15:02 – Managed vs unmanaged memory
  • 00:22:42 – Video Course Spotlight
  • 00:24:01 – Dask active memory manager
  • 00:28:36 – Learning best practices and Dask tutorials
  • 00:33:06 – Where is Dask being used?
  • 00:35:45 – What are you excited about in the world of Python?
  • 00:37:55 – What do you want to learn next?
  • 00:40:31 – Thanks, Guido
  • 00:40:40 – Introduction to Jodie Burchell
  • 00:45:28 – Goodbye
  • Show Links:

    • Coiled | Python for Data Science on the Cloud with Dask
    • Guido Imperiale: Introducing the Dask Active Memory Manager - PyCon DE 2022 - YouTube
    • Active Memory Management on Dask.Distributed - Guido Imperiale | Dask Summit 2021 - YouTube
    • Tackling unmanaged memory with Dask | Coiled
    • The Beginner’s Guide to Distributed Computing | Richard Pelgrim
    • Common Mistakes to Avoid when Using Dask | Coiled
    • File Format | Apache Parquet
    • Dask: Scalable analytics in Python
    • PEP 554 – Multiple Interpreters in the Stdlib | peps.python.org
    • CUDA Python | NVIDIA Developer
    • Rust Programming Language
    • Product : Coiled
    • Coiled (@CoiledHQ) / Twitter
    • Jodie Burchell (@t_redactyl) | Twitter
    • Learn Python through Nursery Rhymes and Fairy Tales: Shari Eskenas - Amazon
    • Level up your Python skills with our expert-led courses:

      • Data Cleaning With pandas and NumPy
      • Navigating Namespaces and Scope in Python
      • Exploring Scopes and Closures in Python
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        139 ratings


        More shows like The Real Python Podcast

        View all
        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        288 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        625 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        579 Listeners

        Soft Skills Engineering by Jamison Dance and Dave Smith

        Soft Skills Engineering

        289 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        302 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        988 Listeners

        Darknet Diaries by Jack Rhysider

        Darknet Diaries

        8,088 Listeners

        Tech Brew Ride Home by Morning Brew

        Tech Brew Ride Home

        969 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        200 Listeners

        AWS Podcast by Amazon Web Services

        AWS Podcast

        207 Listeners

        Django Chat by William Vincent and Carlton Gibson

        Django Chat

        75 Listeners

        Last Week in AI by Skynet Today

        Last Week in AI

        310 Listeners

        Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

        Machine Learning Street Talk (MLST)

        100 Listeners

        The Pragmatic Engineer by Gergely Orosz

        The Pragmatic Engineer

        70 Listeners