The Real Python Podcast

Wes McKinney on Improving the Data Stack & Composable Systems


Listen Later

How do you avoid the bottlenecks of data processing systems? Is it possible to build tools that decouple storage and computation? This week on the show, creator of the pandas library Wes McKinney is here to discuss Apache Arrow, composable data systems, and community collaboration.

Wes briefly describes the humble beginnings of the pandas project in 2008 and moving the project to open source in 2011. Since then, he’s been thinking about improvements across the data processing ecosystem.

Wes collaborated with members of the broader data science community to build the in-memory analytics infrastructure of Apache Arrow. Arrow avoids the bottlenecks of repeated data serialization and format conversion. He shares examples of Arrow’s use across the spectrum in tools like Polars and DuckDB.

Wes advocates moving from vertically integrated tools toward composable data systems. We discuss his work on Ibis, a portable dataframe API for data manipulation and exploration in Python. Ibis supports multiple backends by decoupling the API from the execution engine.

This week’s episode is brought to you by Posit Connect.

Course Spotlight: Unleashing the Power of the Console With Rich

Rich is a powerful library for creating text-based user interfaces (TUIs) in Python. It enhances code readability by pretty-printing complex data structures and adds visual appeal with colored text, tables, animations, and more.

Topics:

  • 00:00:00 – Introduction
  • 00:02:26 – Dealing with limitations in early data science
  • 00:04:53 – Making pandas open source
  • 00:07:10 – Making changes to an existing platform
  • 00:12:34 – Decoupling storage and computation
  • 00:23:04 – Sponsor: Posit Connect
  • 00:23:54 – Apache Arrow solving multiple issues
  • 00:27:40 – DuckDB efficient analytic SQL database
  • 00:30:24 – Polars dataframe library
  • 00:31:04 – pandas 2.0 adding Arrow
  • 00:35:56 – Video Course Spotlight
  • 00:37:20 – Apache Software Foundation background
  • 00:41:29 – Shifting from developer to organizer and collaborator
  • 00:45:56 – Creating a portable query layer with Ibis
  • 00:55:34 – Casualties of the language wars
  • 00:57:57 – What’s your role at Posit?
  • 01:01:23 – What are you excited about in the world of Python?
  • 01:04:52 – What do you want to learn next?
  • 01:06:21 – How can people follow your work online?
  • 01:08:20 – Thanks and goodbye
  • Show Links:

    • Wes McKinney - Personal Website
    • Wes McKinney - The Road to Composable Data Systems: Thoughts on the Last 15 Years and the Future
    • Wes McKinney - Leveling Up the Data Stack: Thoughts on the Last 15 Years - YouTube
    • Apache Hadoop
    • Cloudera - The hybrid data company
    • Wes McKinney - Apache Arrow and the “10 Things I Hate About pandas”
    • Voltron Data - The Leading Designer and Builder of Enterprise Data Systems
    • Apache Arrow
    • DuckDB - An in-process SQL OLAP database management system
    • DuckDB-Wasm - Efficient Analytical SQL in the Browser
    • Polars - Dataframes for the new era
    • pandas 2.2.0 documentation
    • Episode #167: Exploring pandas 2.0 & Targets for Apache Arrow – The Real Python Podcast
    • ASF - Welcome to The Apache Software Foundation!
    • Ursa Labs Blog
    • Ibis - The Portable Python dataframe Library
    • Python dataframe interchange protocol
    • Hadley Wickham
    • Rust Programming Language
    • italki - Best language learning app with certificated tutors
    • Wes McKinney - LinkedIn
    • Wes McKinney (@wesmckinn) - X
    • Posit - The Open-Source Data Science Company
    • Level up your Python skills with our expert-led courses:

      • Data Cleaning With pandas and NumPy
      • Unleashing the Power of the Console With Rich
      • The pandas DataFrame: Working With Data Efficiently
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        136 ratings


        More shows like The Real Python Podcast

        View all
        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        272 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        283 Listeners

        Data Skeptic by Kyle Polich

        Data Skeptic

        481 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        592 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        624 Listeners

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

        443 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        296 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        142 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        982 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        189 Listeners

        DataFramed by DataCamp

        DataFramed

        266 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        189 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        64 Listeners

        Latent Space: The AI Engineer Podcast by swyx + Alessio

        Latent Space: The AI Engineer Podcast

        77 Listeners