Data Engineering Podcast

Exploring The Insights And Impact Of Dan Delorey's Distinguished Career In Data


Listen Later

Summary

Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventure as the VP of Data at SoFi. From being an early engineer on the Dremel project, to helping launch and manage BigQuery, on to helping enterprises adopt Google’s data products he learned all of the critical details of how to run services used by data platform teams. Now he is the consumer of many of the tools that his work inspired. In this episode he takes a trip down memory lane to weave an interesting and informative narrative about the broader themes throughout his work and their echoes in the modern data ecosystem.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at dataengineeringpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan.
  • Your host is Tobias Macey and today I’m interviewing Dan Delorey about his journey through the data ecosystem as the current head of data at SoFi, prior engineering leader with the BigQuery team, and early engineer on Dremel
  • Interview
    • Introduction

    • How did you get involved in the area of data management?

    • Can you start by sharing what your current relationship to the data ecosystem is and the cliffs-notes version of how you ended up there?

    • Dremel was a ground-breaking technology at the time. What do you see as its lasting impression on the landscape of data both in and outside of Google?

    • You were instrumental in crafting the vision behind "querying data in place," (what they called, federated data) at Dremel and BigQuery. What do you mean by this? How has this approach evolved? What are some challenges with this approach?

      • How well did the Drill project capture the core principles of Dremel as outlined in the eponymous white paper?
      • Following your work on Drill you were involved with the development and growth of BigQuery and the broader suite of Google Cloud’s data platform. What do you see as the influence that those tools had on the evolution of the broader data ecosystem?

      • How have your experiences at Google influenced your approach to platform and organizational design at SoFi?

      • What’s in SoFi’s data stack? How do you decide what technologies to buy vs. build in-house?

      • How does your team solve for data quality and governance?

        • What are the dominating factors that you consider when deciding on project/product priorities for your team?
        • When you’re not building industry-defining data tooling or leading data strategy, you spend time thinking about the ethics of data. Can you elaborate a bit about your research and interest there?

        • You also have some ideas about data marketplaces, which is a hot topic these days with companies like Snowflake and Databricks breaking into this economy. What’s your take on the evolution of this space?

        • What are the most interesting, innovative, or unexpected data systems that you have encountered?

        • What are the most interesting, unexpected, or challenging lessons that you have learned while working on building and supporting data systems?

        • What are the areas that you are paying the most attention to?

        • What interesting predictions do you have for the future of data systems and their applications?

          Contact Info
          • LinkedIn
          • Parting Question
            • From your perspective, what is the biggest gap in the tooling or technology for data management today?
            • Closing Announcements
              • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
              • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
              • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
              • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
              • Links
                • SoFi
                • Bigquery
                • Dremel
                • Brigham Young University
                • Empirical Software Engineering
                • Map/Reduce
                • Hadoop
                • Sawzall
                • VLDB Test Of Time Award Paper
                • GFS
                • Colossus
                • Partitioned Hash Join
                • Google BigTable
                • HBase
                • AWS Athena
                • Snowflake
                  • Podcast Episode
                  • Data Vault
                  • Star Schema
                  • Privacy Vault
                  • Homomorphic Encryption
                  • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                    Support Data Engineering Podcast

                    ...more
                    View all episodesView all episodes
                    Download on the App Store

                    Data Engineering PodcastBy Tobias Macey

                    • 4.5
                    • 4.5
                    • 4.5
                    • 4.5
                    • 4.5

                    4.5

                    136 ratings


                    More shows like Data Engineering Podcast

                    View all
                    Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                    Software Engineering Radio - the podcast for professional software developers

                    271 Listeners

                    The Changelog: Software Development, Open Source by Changelog Media

                    The Changelog: Software Development, Open Source

                    283 Listeners

                    The Cloudcast by Massive Studios

                    The Cloudcast

                    153 Listeners

                    Thoughtworks Technology Podcast by Thoughtworks

                    Thoughtworks Technology Podcast

                    41 Listeners

                    Data Skeptic by Kyle Polich

                    Data Skeptic

                    475 Listeners

                    Talk Python To Me by Michael Kennedy

                    Talk Python To Me

                    583 Listeners

                    Software Engineering Daily by Software Engineering Daily

                    Software Engineering Daily

                    627 Listeners

                    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                    Super Data Science: ML & AI Podcast with Jon Krohn

                    296 Listeners

                    Python Bytes by Michael Kennedy and Brian Okken

                    Python Bytes

                    214 Listeners

                    DataFramed by DataCamp

                    DataFramed

                    270 Listeners

                    Practical AI by Practical AI LLC

                    Practical AI

                    187 Listeners

                    The Stack Overflow Podcast by The Stack Overflow Podcast

                    The Stack Overflow Podcast

                    63 Listeners

                    The Real Python Podcast by Real Python

                    The Real Python Podcast

                    140 Listeners

                    Latent Space: The AI Engineer Podcast by swyx + Alessio

                    Latent Space: The AI Engineer Podcast

                    70 Listeners

                    The Pragmatic Engineer by Gergely Orosz

                    The Pragmatic Engineer

                    62 Listeners