Data Engineering Podcast

An Overview Of The State Of Data Orchestration In An Increasingly Complex Data Ecosystem


Listen Later

Summary

Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
  • This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold
  • You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!
  • Your host is Tobias Macey and today I'm welcoming back Nick Schrock to talk about the state of the ecosystem for data orchestration
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by defining what data orchestration is and how it differs from other types of orchestration systems? (e.g. container orchestration, generalized workflow orchestration, etc.)
    • What are the misconceptions about the applications of/need for/cost to implement data orchestration?
      • How do those challenges of customer education change across roles/personas?
      • Because of the multi-faceted nature of data in an organization, how does that influence the capabilities and interfaces that are needed in an orchestration engine?
      • You have been working on Dagster for five years now. How have the requirements/adoption/application for orchestrators changed in that time?
      • One of the challenges for any orchestration engine is to balance the need for robust and extensible core capabilities with a rich suite of integrations to the broader data ecosystem. What are the factors that you have seen make the most influence in driving adoption of a given engine?
      • What are the most interesting, innovative, or unexpected ways that you have seen data orchestration implemented and/or used?
      • What are the most interesting, unexpected, or challenging lessons that you have learned while working on data orchestration?
      • When is a data orchestrator the wrong choice?
      • What do you have planned for the future of orchestration with Dagster?
      • Contact Info
        • @schrockn on Twitter
        • LinkedIn
        • Parting Question
          • From your perspective, what is the biggest gap in the tooling or technology for data management today?
          • Closing Announcements
            • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
            • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
            • If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
            • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
            • Links
              • Dagster
              • GraphQL
              • K8s == Kubernetes
              • Airbyte
                • Podcast Episode
                • Hightouch
                  • Podcast Episode
                  • Airflow
                  • Prefect
                  • Flyte
                    • Podcast Episode
                    • dbt
                      • Podcast Episode
                      • DAG == Directed Acyclic Graph
                      • Temporal
                      • Software Defined Assets
                      • DataForm
                      • Gradient Flow State Of Orchestration Report 2022
                      • MLOps Is 98% Data Engineering
                      • DataHub
                        • Podcast Episode
                        • OpenMetadata
                          • Podcast Episode
                          • Atlan
                            • Podcast Episode
                            • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                              Sponsored By:

                              • Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png)
                              Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)
                            • Materialize: ![Materialize](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/NuMEahiy.png)
                            • You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.
                              That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.
                              Go to [materialize.com](https://materialize.com/register/?utm_source=depodcast&utm_medium=paid&utm_campaign=early-access) today and get 2 weeks free!
                            • Datafold: ![Datafold](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/zm6x2tFu.png)
                            • This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting [dataengineeringpodcast.com/datafold](https://www.dataengineeringpodcast.com/datafold) today!

                              Support Data Engineering Podcast

                              ...more
                              View all episodesView all episodes
                              Download on the App Store

                              Data Engineering PodcastBy Tobias Macey

                              • 4.6
                              • 4.6
                              • 4.6
                              • 4.6
                              • 4.6

                              4.6

                              134 ratings


                              More shows like Data Engineering Podcast

                              View all
                              Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                              Software Engineering Radio - the podcast for professional software developers

                              262 Listeners

                              The Changelog: Software Development, Open Source by Changelog Media

                              The Changelog: Software Development, Open Source

                              285 Listeners

                              The Cloudcast by Massive Studios

                              The Cloudcast

                              153 Listeners

                              Thoughtworks Technology Podcast by Thoughtworks

                              Thoughtworks Technology Podcast

                              43 Listeners

                              Data Skeptic by Kyle Polich

                              Data Skeptic

                              474 Listeners

                              Talk Python To Me by Michael Kennedy

                              Talk Python To Me

                              585 Listeners

                              Software Engineering Daily by Software Engineering Daily

                              Software Engineering Daily

                              630 Listeners

                              AWS Podcast by Amazon Web Services

                              AWS Podcast

                              200 Listeners

                              Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                              Super Data Science: ML & AI Podcast with Jon Krohn

                              295 Listeners

                              Python Bytes by Michael Kennedy and Brian Okken

                              Python Bytes

                              212 Listeners

                              DataFramed by DataCamp

                              DataFramed

                              267 Listeners

                              Practical AI by Practical AI LLC

                              Practical AI

                              196 Listeners

                              The Stack Overflow Podcast by The Stack Overflow Podcast

                              The Stack Overflow Podcast

                              63 Listeners

                              The Real Python Podcast by Real Python

                              The Real Python Podcast

                              136 Listeners

                              Latent Space: The AI Engineer Podcast by swyx + Alessio

                              Latent Space: The AI Engineer Podcast

                              64 Listeners