Data Engineering Podcast

Evolving An ETL Pipeline For Better Productivity


Listen Later

Summary

Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service. He explains how their original implementation was built, why they decided to migrate to a paid service, and how they made that transition. He also discusses how the abstractions provided by DataCoral allows his data scientists to remain productive without requiring dedicated data engineers. If you are either considering how to build a data pipeline or debating whether to migrate your existing ETL to a service this is definitely worth listening to for some perspective.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to dataengineeringpodcast.com/clubhouse today and signing up for a free trial. Support the show and get your data projects in order!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Aaron Gibralter and Raghu Murthy about the experience of Greenhouse migrating their data pipeline to DataCoral
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Aaron, can you start by describing what Greenhouse is and some of the ways that you use data?
    • Can you describe your overall data infrastructure and the state of your data pipeline before migrating to DataCoral?
      • What are your primary sources of data and what are the targets that you are loading them into?
      • What were your biggest pain points and what motivated you to re-evaluate your approach to ETL?
        • What were your criteria for your replacement technology and how did you gather and evaluate your options?
        • Once you made the decision to use DataCoral can you talk through the transition and cut-over process?
          • What were some of the unexpected edge cases or shortcomings that you experienced when moving to DataCoral?
          • What were the big wins?
          • What was your evaluation framework for determining whether your re-engineering was successful?
          • Now that you are using DataCoral how would you characterize the experiences of yourself and your team?
            • If you have freed up time for your engineers, how are you allocating that spare capacity?
            • What do you hope to see from DataCoral in the future?
            • What advice do you have for anyone else who is either evaluating a re-architecture of their existing data platform or planning out a greenfield project?
            • Contact Info
              • Aaron
                • agribralter on GitHub
                • LinkedIn
                • Raghu
                  • LinkedIn
                  • Medium
                  • Parting Question
                    • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                    • Links
                      • Greenhouse
                        • We’re hiring Data Scientists and Software Engineers!
                        • Datacoral
                        • Airflow
                          • Podcast.init Interview
                          • Data Engineering Interview about running Airflow in production
                          • Periscope Data
                          • Mode Analytics
                          • Data Warehouse
                          • ETL
                          • Salesforce
                          • Zendesk
                          • Jira
                          • DataDog
                          • Asana
                          • GDPR
                          • Metabase
                            • Podcast Interview
                            • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                              Support Data Engineering Podcast

                              ...more
                              View all episodesView all episodes
                              Download on the App Store

                              Data Engineering PodcastBy Tobias Macey

                              • 4.5
                              • 4.5
                              • 4.5
                              • 4.5
                              • 4.5

                              4.5

                              142 ratings


                              More shows like Data Engineering Podcast

                              View all
                              The Changelog: Software Development, Open Source by Changelog Media

                              The Changelog: Software Development, Open Source

                              289 Listeners

                              Software Engineering Daily by Software Engineering Daily

                              Software Engineering Daily

                              624 Listeners

                              Talk Python To Me by Michael Kennedy

                              Talk Python To Me

                              583 Listeners

                              Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                              Super Data Science: ML & AI Podcast with Jon Krohn

                              302 Listeners

                              NVIDIA AI Podcast by NVIDIA

                              NVIDIA AI Podcast

                              343 Listeners

                              Practical AI by Practical AI LLC

                              Practical AI

                              204 Listeners

                              AWS Podcast by Amazon Web Services

                              AWS Podcast

                              205 Listeners

                              Last Week in AI by Skynet Today

                              Last Week in AI

                              305 Listeners

                              Dwarkesh Podcast by Dwarkesh Patel

                              Dwarkesh Podcast

                              523 Listeners

                              The Data Engineering Show by The Firebolt Data Bros

                              The Data Engineering Show

                              8 Listeners

                              No Priors: Artificial Intelligence | Technology | Startups by Conviction

                              No Priors: Artificial Intelligence | Technology | Startups

                              129 Listeners

                              Latent Space: The AI Engineer Podcast by swyx + Alessio

                              Latent Space: The AI Engineer Podcast

                              92 Listeners

                              This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                              This Day in AI Podcast

                              227 Listeners

                              The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                              The AI Daily Brief: Artificial Intelligence News and Analysis

                              633 Listeners

                              AI + a16z by a16z

                              AI + a16z

                              36 Listeners