The Python Podcast.__init__

Great Expectations For Your Data Pipelines with Abe Gong and James Campbell


Listen Later

Summary

Testing is a critical activity in all software projects, but one that is often neglected in data pipelines. The complexities introduced by the inherent statefulness of the problem domain and the interdependencies between systems contribute to make pipeline testing difficult to manage. To make this endeavor more manageable Abe Gong and James Campbell have created Great Expectations. In this episode they discuss how you can use the project to create tests in the exploratory phase of building a pipeline and leverage those to monitor your systems in production. They also discussed how Great Expectations works, the difficulties associated with pipeline testing and managing associated technical debt, and their future plans for the project.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • Finding a bug in production is never a fun experience, especially when your users find it first. Airbrake error monitoring ensures that you will always be the first to know so you can deploy a fix before anyone is impacted. With open source agents for Python 2 and 3 it’s easy to get started, and the automatic aggregations, contextual information, and deployment tracking ensure that you don’t waste time pinpointing what went wrong. Go to podcastinit.com/airbrake today to sign up and get your first 30 days free, and 50% off 3 months of the Startup plan.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected]
  • Your host as usual is Tobias Macey and today I’m interviewing James Campbell and Abe Gong about Great Expectations, a tool for testing the data in your analytics pipelines
  • Interview
    • Introduction
    • How did you first get introduced to Python?
    • What is Great Expectations and what was your motivation for starting it?
    • What are some of the complexities associated with testing analytics pipelines?
      • What types of tests can be executed to ensure data integrity and accuracy?

      • What are some examples of the potential impact of pipeline debt?

      • What is Great Expectations and how does it simplify the process of building and executing pipeline tests?

      • What are some examples of the types of tests that can be built with Great Expectations?

      • For someone getting started with Great Expectations what does the workflow look like?

      • What was your reason for using Python for building it?

        • How does the choice of language benefit or hinder the contexts in which Great Expectations can be used?

        • What are some cases where Great Expectations would not be usable or useful?

        • What have been some of the most challenging aspects of building and using Great Expectations?

        • What are your hopes for Great Expectations going forward?

        • Contact Info
          • James
            • jpcampb2 on GitHub

            • Abe

              • abegong on GitHub
              • Website
              • @AbeGong on Twitter

              • Picks
                • Tobias
                  • Fitbit Versa

                  • James

                    • Unplug and spend some time away from the computer

                    • Abe

                      • Superconductive Health
                      • Slack: Getting Past Burnout, Busy Work, and the Myth of Total Efficiency

                      • Links
                        • Superconductive Health
                        • Laboratory for Analytical Sciences
                        • Great Expectations
                        • Medium Post
                        • DAG (Directed Acyclic Graph)
                        • SLA (Service Level Agreement)
                        • Integration Testing
                        • Data Engineering
                        • Histogram
                        • Pandas
                        • SQLAlchemy
                        • Tutorial Videos
                        • Jupyter Notebooks
                        • Dataframe
                        • Airflow
                        • Luigi
                        • Spark
                        • Oozie
                        • Azkaban
                        • JSON
                        • XML
                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                          ...more
                          View all episodesView all episodes
                          Download on the App Store

                          The Python Podcast.__init__By Tobias Macey

                          • 4.4
                          • 4.4
                          • 4.4
                          • 4.4
                          • 4.4

                          4.4

                          100 ratings


                          More shows like The Python Podcast.__init__

                          View all
                          The Changelog: Software Development, Open Source by Changelog Media

                          The Changelog: Software Development, Open Source

                          284 Listeners

                          All Ears English Podcast by Lindsay McMahon and Michelle Kaplan

                          All Ears English Podcast

                          2,307 Listeners

                          Data Skeptic by Kyle Polich

                          Data Skeptic

                          475 Listeners

                          Talk Python To Me by Michael Kennedy

                          Talk Python To Me

                          583 Listeners

                          Software Engineering Daily by Software Engineering Daily

                          Software Engineering Daily

                          626 Listeners

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                          438 Listeners

                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                          Super Data Science: ML & AI Podcast with Jon Krohn

                          296 Listeners

                          Python Bytes by Michael Kennedy and Brian Okken

                          Python Bytes

                          214 Listeners

                          Data Engineering Podcast by Tobias Macey

                          Data Engineering Podcast

                          141 Listeners

                          Machine Learning Guide by OCDevel

                          Machine Learning Guide

                          770 Listeners

                          Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                          Syntax - Tasty Web Development Treats

                          987 Listeners

                          DataFramed by DataCamp

                          DataFramed

                          270 Listeners

                          Practical AI by Practical AI LLC

                          Practical AI

                          187 Listeners

                          The Real Python Podcast by Real Python

                          The Real Python Podcast

                          140 Listeners

                          Business English from All Ears English by Lindsay McMahon

                          Business English from All Ears English

                          73 Listeners