The Real Python Podcast

Exploring pandas 2.0 & Targets for Apache Arrow


Listen Later

What are the new ways to describe your data in pandas 2.0? Will the addition of Apache Arrow to the data back end foster the growth of data interoperability? This week on the show, we talk with pandas core developer Marc Garcia about the release of pandas 2.0.

Marc shares his background and work on pandas. We discuss the history of data representation in pandas and the need to move beyond NumPy. We also talk about how Apache Arrow only solves some of the issues.

We dig into the potential of an Apache Arrow back end and how it could offer interoperability between data platforms. We also cover the moderate adoption and backward-compatibility concerns. Marc also shares his thoughts on making pandas more extensible.

Course Spotlight: The pandas DataFrame: Working With Data Efficiently

In this course, you’ll get started with pandas DataFrames, which are powerful and widely used two-dimensional data structures. You’ll learn how to perform basic operations with data, handle missing values, work with time-series data, and visualize data from a pandas DataFrame.

Topics:

  • 00:00:00 – Introduction
  • 00:02:07 – Getting involved with the pandas project
  • 00:03:48 – Continued growth of the platform
  • 00:06:49 – Parallel branch development
  • 00:09:19 – The introduction of Apache Arrow
  • 00:18:53 – Working with NumPy data in pandas
  • 00:30:18 – Arrow data types and strings
  • 00:41:23 – Video Course Spotlight
  • 00:42:37 – Interoperability of Arrow data back end
  • 00:50:36 – Could pandas be more extensible?
  • 01:00:49 – Python DataFrame Summit 2023
  • 01:08:12 – What are you excited about in the world of Python?
  • 01:11:13 – What do you want to learn next?
  • 01:12:12 – How can people follow your work online?
  • 01:13:46 – Thanks and Goodbye
  • Show Links:

    • Marc Garcia - datapythonista - data engineer, data scientist and pandas core developer
    • pandas 2.0 and the Arrow revolution (part I)
    • The pandas of the future - Marc Garcia - SciPyLA 2019 - TubEdu
    • The deadly consequences of rounding errors - Slate
    • Community Blog - pandas - Python Data Analysis Library
    • Apache Arrow - Apache Arrow
    • Apache Arrow and the “10 Things I Hate About pandas” - Wes McKinney
    • I/O Extensions in pandas - PDEP-9
    • Extension Arrays for Pandas - Tom’s Blog
    • Python Dataframe Summit 2023
    • Rust Programming Language
    • Freediving - Wikipedia
    • Marc Garcia - LinkedIn
    • Marc Garcia (@datapythonista) - X
    • Level up your Python skills with our expert-led courses:

      • Reading and Writing Files With pandas
      • Explore Your Dataset With pandas
      • The pandas DataFrame: Working With Data Efficiently
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        377 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        265 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        287 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        262 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        41 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        584 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        628 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        294 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        140 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        988 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        186 Listeners

        DataFramed by DataCamp

        DataFramed

        269 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        190 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners