The Python Podcast.__init__

Great Expectations For Your Data Pipelines with Abe Gong and James Campbell


Listen Later

Summary

Testing is a critical activity in all software projects, but one that is often neglected in data pipelines. The complexities introduced by the inherent statefulness of the problem domain and the interdependencies between systems contribute to make pipeline testing difficult to manage. To make this endeavor more manageable Abe Gong and James Campbell have created Great Expectations. In this episode they discuss how you can use the project to create tests in the exploratory phase of building a pipeline and leverage those to monitor your systems in production. They also discussed how Great Expectations works, the difficulties associated with pipeline testing and managing associated technical debt, and their future plans for the project.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • Finding a bug in production is never a fun experience, especially when your users find it first. Airbrake error monitoring ensures that you will always be the first to know so you can deploy a fix before anyone is impacted. With open source agents for Python 2 and 3 it’s easy to get started, and the automatic aggregations, contextual information, and deployment tracking ensure that you don’t waste time pinpointing what went wrong. Go to podcastinit.com/airbrake today to sign up and get your first 30 days free, and 50% off 3 months of the Startup plan.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected]
  • Your host as usual is Tobias Macey and today I’m interviewing James Campbell and Abe Gong about Great Expectations, a tool for testing the data in your analytics pipelines
  • Interview
    • Introduction
    • How did you first get introduced to Python?
    • What is Great Expectations and what was your motivation for starting it?
    • What are some of the complexities associated with testing analytics pipelines?
      • What types of tests can be executed to ensure data integrity and accuracy?

      • What are some examples of the potential impact of pipeline debt?

      • What is Great Expectations and how does it simplify the process of building and executing pipeline tests?

      • What are some examples of the types of tests that can be built with Great Expectations?

      • For someone getting started with Great Expectations what does the workflow look like?

      • What was your reason for using Python for building it?

        • How does the choice of language benefit or hinder the contexts in which Great Expectations can be used?

        • What are some cases where Great Expectations would not be usable or useful?

        • What have been some of the most challenging aspects of building and using Great Expectations?

        • What are your hopes for Great Expectations going forward?

        • Contact Info
          • James
            • jpcampb2 on GitHub

            • Abe

              • abegong on GitHub
              • Website
              • @AbeGong on Twitter

              • Picks
                • Tobias
                  • Fitbit Versa

                  • James

                    • Unplug and spend some time away from the computer

                    • Abe

                      • Superconductive Health
                      • Slack: Getting Past Burnout, Busy Work, and the Myth of Total Efficiency

                      • Links
                        • Superconductive Health
                        • Laboratory for Analytical Sciences
                        • Great Expectations
                        • Medium Post
                        • DAG (Directed Acyclic Graph)
                        • SLA (Service Level Agreement)
                        • Integration Testing
                        • Data Engineering
                        • Histogram
                        • Pandas
                        • SQLAlchemy
                        • Tutorial Videos
                        • Jupyter Notebooks
                        • Dataframe
                        • Airflow
                        • Luigi
                        • Spark
                        • Oozie
                        • Azkaban
                        • JSON
                        • XML
                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                          ...more
                          View all episodesView all episodes
                          Download on the App Store

                          The Python Podcast.__init__By Tobias Macey

                          • 4.4
                          • 4.4
                          • 4.4
                          • 4.4
                          • 4.4

                          4.4

                          100 ratings


                          More shows like The Python Podcast.__init__

                          View all
                          Freakonomics Radio by Freakonomics Radio + Stitcher

                          Freakonomics Radio

                          32,021 Listeners

                          Odd Lots by Bloomberg

                          Odd Lots

                          1,930 Listeners

                          The Changelog: Software Development, Open Source by Changelog Media

                          The Changelog: Software Development, Open Source

                          289 Listeners

                          Data Skeptic by Kyle Polich

                          Data Skeptic

                          480 Listeners

                          Software Engineering Daily by Software Engineering Daily

                          Software Engineering Daily

                          623 Listeners

                          Talk Python To Me by Michael Kennedy

                          Talk Python To Me

                          585 Listeners

                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                          Super Data Science: ML & AI Podcast with Jon Krohn

                          303 Listeners

                          Python Bytes by Michael Kennedy and Brian Okken

                          Python Bytes

                          215 Listeners

                          Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                          Syntax - Tasty Web Development Treats

                          987 Listeners

                          DataFramed by DataCamp

                          DataFramed

                          269 Listeners

                          Practical AI by Practical AI LLC

                          Practical AI

                          207 Listeners

                          The Intelligence from The Economist by The Economist

                          The Intelligence from The Economist

                          2,552 Listeners

                          The Real Python Podcast by Real Python

                          The Real Python Podcast

                          142 Listeners

                          声动早咖啡 by 声动活泼

                          声动早咖啡

                          293 Listeners

                          The Foreign Affairs Interview by Foreign Affairs Magazine

                          The Foreign Affairs Interview

                          449 Listeners