The Real Python Podcast

Speeding Up Your DataFrames With Polars


Listen Later

How can you get more performance from your existing data science infrastructure? What if a DataFrame library could take advantage of your machine’s available cores and provide built-in methods for handling larger-than-RAM datasets? This week on the show, Liam Brannigan is here to discuss Polars.

Liam is an experienced data scientist working in finance, technology, and environmental analysis. He’s recently started contributing to the documentation for Polars and developing a training course for the library.

We talk about the library’s overall speed and lack of additional dependencies. Liam explains the advantages of lazy vs eager mode and which to choose when performing data exploration or attempting to load a dataset larger than your RAM.

We also discuss potential barriers to switching to Polars from a pandas workflow. Across our conversation, we explore several other libraries and technologies, including Apache Arrow, DuckDB, query optimization, and the “rustification” of Python tools.

Course Spotlight: Graph Your Data With Python and ggplot

In this course, you’ll learn how to use ggplot in Python to build data visualizations with plotnine. You’ll discover what a grammar of graphics is and how it can help you create plots in a very concise and consistent way.

Show Topics:

  • 00:00:00 – Introduction
  • 00:02:06 – Liam’s background and intro to Polars
  • 00:03:37 – Hurdles to switching to Polars
  • 00:05:23 – Creating training resources
  • 00:08:15 – No index
  • 00:09:46 – Data science 2025 predictions
  • 00:12:02 – Contributions to Polars
  • 00:15:07 – Eager vs lazy mode & query optimization
  • 00:19:25 – Sponsor: Anaconda Nucleus
  • 00:20:00 – Apache Arrow and parquet
  • 00:24:43 – DuckDB and column orientation
  • 00:29:27 – The “rustification” of libraries
  • 00:34:49 – Video Course Spotlight
  • 00:36:16 – GPUs and memory requirements
  • 00:45:49 – No additional library requirements
  • 00:47:37 – Development of the ecosystem
  • 00:51:33 – Chaining operations
  • 00:53:39 – How can people follow your work?
  • 00:54:51 – What are you excited about in the world of Python?
  • 00:56:09 – What do you want to learn next?
  • 00:56:58 – Thanks and goodbye
  • Show Links:

    • Liam Brannigan - Data Scientist
    • Polars
    • polars - PyPI
    • Coming from Pandas - Polars - User Guide
    • Rho-Signal Data Analytics - YouTube
    • Cheatsheet for Pandas to Polars - Rho Signal
    • Data Analysis with Polars - Udemy
    • I wrote one of the fastest DataFrame libraries - Polars
    • Database-like ops benchmark comparison
    • Data science 2025 - Liam Brannigan
    • DuckDB - An in-process SQL OLAP database management system
    • The great Python DataFrame showdown, part 1: Demystifying Apache Arrow
    • Apache Arrow
    • Learn Rust - Rust Programming Language
    • Modern Polars
    • Anaconda - PyScript Updates: Bytecode Alliance, Pyodide, and MicroPython
    • Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R Scripts
    • Polars up and running - Liam Brannigan
    • Liam Brannigan - Data Scientist - Blog
    • Liam Brannigan (@braaannigan) - Twitter
    • Liam Brannigan - LinkedIn
    • Level up your Python skills with our expert-led courses:

      • Threading in Python
      • Reading and Writing Files With pandas
      • Graph Your Data With Python and ggplot
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        377 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        265 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        287 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        262 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        41 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        584 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        628 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        294 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        140 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        988 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        186 Listeners

        DataFramed by DataCamp

        DataFramed

        269 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        190 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners