The Real Python Podcast

Orchestrating Large and Small Projects With Apache Airflow


Listen Later

Have you worked on a project that needed an orchestration tool? How do you define the workflow of an entire data pipeline or a messaging system with Python? This week on the show, Calvin Hendryx-Parker is back to talk about using Apache Airflow and orchestrating Python projects.

Calvin is the co-founder and CTO of Six Feet Up and a Python Web Conference co-organizer. He’s recently been working on a massive project that requires thousands of jobs involving transferring and transforming data. Through his research into orchestration systems, he found Apache Airflow.

Airflow is an open-source tool to define, schedule, and monitor workflows. The platform is pure Python and integrates with a wide variety of services. We discuss how workflows are defined by creating directed acyclic graphs (DAG).

Calvin talks about how a recent project outgrew the system and how his team built a clever solution using Python. We also discuss the upcoming Python Web Conference and what virtual attendees can expect.

Course Spotlight: Python Basics: Object-Oriented Programming

In this video course, you’ll get to know OOP, or object-oriented programming. You’ll learn how to create a class, use classes to create new objects, and instantiate classes with attributes.

Topics:

  • 00:00:00 – Introduction
  • 00:02:24 – Describing the large data pipeline
  • 00:04:38 – What format was the data in?
  • 00:06:04 – Was the format of the data changed for storage?
  • 00:09:34 – Data engineering and describing sources and targets
  • 00:11:29 – Apache Airflow orchestration and hitting limitations
  • 00:18:12 – Sponsor: CData Software
  • 00:18:54 – DAG: Directed acyclic graphs
  • 00:22:29 – Streaming data and other tool choices
  • 00:25:38 – Overcoming DAG Factory limitations
  • 00:31:49 – Another industry example for Airflow
  • 00:34:24 – Finding solutions as a consultancy
  • 00:35:12 – Is there a minimum-size project for Airflow?
  • 00:37:37 – Django under the hood
  • 00:38:31 – Video Course Spotlight
  • 00:39:58 – The Python Web Conference 2023
  • 00:44:24 – Do you have any upcoming conference talks?
  • 00:45:53 – How can people follow your work online?
  • 00:46:52 – IndyPy talk by Mariatta Wijaya
  • 00:48:01 – What are you excited about in the world of Python?
  • 00:51:45 – What do you want to learn next?
  • 00:53:22 – Thanks and goodbye
  • Show Links:

    • Apache Airflow - Documentation
    • Too Big for DAG Factories? — Six Feet Up
    • Directed acyclic graph - Wikipedia
    • DAGs — Airflow Documentation
    • Dynamically generating DAGs in Airflow - Astronomer Documentation
    • Data Lakehouse Architecture and AI Company - Databricks
    • Episode #10: Python Job Hunting in a Pandemic – The Real Python Podcast
    • Episode #124: Exploring Recursion in Python With Al Sweigart – The Real Python Podcast
    • The Recursive Book of Recursion
    • Episode #61: Scaling Data Science and Machine Learning Infrastructure Like Netflix – The Real Python Podcast
    • IndyPy — Indiana Python User Group
    • Contributing to Python - Mariatta Wijaya - Python Core Developer - YouTube
    • Home Assistant
    • Arturia - MicroFreak
    • Arturia - Pigments
    • CalvinHP (@[email protected]) - Fosstodon
    • calvinhp - Twitter
    • Six Feet Up - Blog
    • Python Web Conference 2023
    • Level up your Python skills with our expert-led courses:

      • Data Cleaning With pandas and NumPy
      • Python Basics: Object-Oriented Programming
      • Intro to Object-Oriented Programming (OOP) in Python
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        377 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        265 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        287 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        262 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        41 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        584 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        628 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        294 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        140 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        988 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        186 Listeners

        DataFramed by DataCamp

        DataFramed

        269 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        190 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners