The Real Python Podcast

Packaging Data Analyses & Using pandas GroupBy


Listen Later

What are the best practices for organizing data analysis projects in Python? What are the advantages of a more package-centric approach to data science? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder’s Weekly articles and projects.

We discuss Joshua Cook’s recent article “How I Use Python to Organize My Data Analyses.” The article covers how his process for building data analysis projects has evolved and now incorporates modern Python packaging techniques.

Christopher shares his recent video course on grouping real-world data with pandas. The course offers a quick refresher before digging into how to use pandas GroupBy to manipulate, transform, and summarize data.

We also share several other articles and projects from the Python community, including a news roundup, working with JSON data in Python, running an Asyncio event loop in a separate thread, knowing the why behind a system’s code, a retro game engine for Python, and a project for vendorizing packages from PyPI.

This episode is sponsored by Mailtrap.

Course Spotlight: pandas GroupBy: Grouping Real World Data in Python

In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.

Topics:

  • 00:00:00 – Introduction
  • 00:02:18 – Setuptools Breaks Things, Then Fixes Them
  • 00:04:57 – PEP 751: A File Format to List Python Dependencies
  • 00:07:04 – Python 3.13.0 Release Candidate 1 Released
  • 00:07:15 – Python Insider: Python 3.12.5 released
  • 00:07:22 – Django 5.1 released - Django Weblog
  • 00:07:27 – Django security releases issued: 5.0.8 and 4.2.15
  • 00:07:49 – How I Use Python to Organize My Data Analyses
  • 00:13:45 – Sponsor: Mailtrap
  • 00:14:21 – pandas GroupBy: Grouping Real World Data in Python
  • 00:20:33 – Working With JSON Data in Python
  • 00:25:01 – Asyncio Event Loop in Separate Thread
  • 00:30:33 – Video Course Spotlight
  • 00:31:47 – Habits of great software engineers
  • 00:49:17 – pyxel: A Retro Game Engine for Python
  • 00:52:36 – python-vendorize: Vendorize Packages From PyPI
  • 00:54:18 – Thanks and goodbye
  • News:

    • Setuptools Breaks Things, Then Fixes Them – This post is Bite Code’s monthly summary, but the lead story happened just days ago. In line with a 7 year old deprecation, setuptools finally removed the ability to call its test command. Many packages promptly broke. The following day the change was undone.
    • PEP 751: A File Format to List Python Dependencies for Installation Reproducibility (New) – This PEP proposes a new file format for dependency specification to enable reproducible installation in a Python environment.
    • Python 3.13.0 Release Candidate 1 Released
    • Python Insider: Python 3.12.5 released
    • Django 5.1 released - Django Weblog
    • Django security releases issued: 5.0.8 and 4.2.15 - Django Weblog
    • Show Links:

      • How I Use Python to Organize My Data Analyses – This is a description of how Joshua uses Python in a package-centric way to organize his approach to data analyses. This is a system he has evolved while working on his computational biology Ph.D. and working in industry.
      • pandas GroupBy: Grouping Real World Data in Python – In this course, you’ll learn how to work adeptly with the pandas GroupBy while mastering ways to manipulate, transform, and summarize data. You’ll work with real-world datasets and chain GroupBy methods together to get data into an output that suits your needs.
      • Working With JSON Data in Python – In this tutorial, you’ll learn how to read and write JSON-encoded data in Python. You’ll begin with practical examples that show how to use Python’s built-in “json” module and then move on to learn how to serialize and deserialize custom data.
      • Asyncio Event Loop in Separate Thread – Typically, the asyncio event loop runs in the main thread, but as that is the one used by the interpreter, sometimes you want the event loop to run in a separate thread. This article talks about why and how to do just that.
      • Discussion:

        • Habits of great software engineers
        • Projects:

          • pyxel: A Retro Game Engine for Python
          • python-vendorize: Vendorize Packages From PyPI
          • Additional Links:

            • Everyday Project Packaging With pyproject.toml – Real Python
            • Packaging Your Python Code With pyproject.toml - Complete Code Conversation - YouTube
            • Episode #197: Using Python in Bioinformatics and the Laboratory – The Real Python Podcast
            • Level up your Python skills with our expert-led courses:

              • Everyday Project Packaging With pyproject.toml
              • Working With JSON in Python
              • pandas GroupBy: Grouping Real World Data in Python
              • Support the podcast & join our community of Pythonistas

                ...more
                View all episodesView all episodes
                Download on the App Store

                The Real Python PodcastBy Real Python

                • 4.7
                • 4.7
                • 4.7
                • 4.7
                • 4.7

                4.7

                136 ratings


                More shows like The Real Python Podcast

                View all
                Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                Software Engineering Radio - the podcast for professional software developers

                272 Listeners

                The Changelog: Software Development, Open Source by Changelog Media

                The Changelog: Software Development, Open Source

                283 Listeners

                Data Skeptic by Kyle Polich

                Data Skeptic

                481 Listeners

                Talk Python To Me by Michael Kennedy

                Talk Python To Me

                592 Listeners

                Software Engineering Daily by Software Engineering Daily

                Software Engineering Daily

                624 Listeners

                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                443 Listeners

                Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                Super Data Science: ML & AI Podcast with Jon Krohn

                296 Listeners

                Python Bytes by Michael Kennedy and Brian Okken

                Python Bytes

                213 Listeners

                Data Engineering Podcast by Tobias Macey

                Data Engineering Podcast

                142 Listeners

                Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                Syntax - Tasty Web Development Treats

                982 Listeners

                CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

                CoRecursive: Coding Stories

                189 Listeners

                DataFramed by DataCamp

                DataFramed

                266 Listeners

                Practical AI by Practical AI LLC

                Practical AI

                189 Listeners

                The Stack Overflow Podcast by The Stack Overflow Podcast

                The Stack Overflow Podcast

                64 Listeners

                Latent Space: The AI Engineer Podcast by swyx + Alessio

                Latent Space: The AI Engineer Podcast

                77 Listeners