Data Engineering Podcast

Building The Foundations For Data Driven Businesses at 5xData


Listen Later

Summary

Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed. In this episode he shares his thoughts on the core elements that are necessary for every business to be data driven, how he is helping companies incorporate those capabilities into their structure, and the ongoing support that he is providing through a network of mastermind groups. This is a great conversation about the initial steps that every group should be thinking of as they start down the road to making data informed decisions.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask.
  • RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today.
  • Your host is Tobias Macey and today I’m interviewing Tarush Aggarwal about his mission at 5xData to teach companies how to build solid foundations for their data capabilities
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by giving an overview of what you are building at 5xData and the story behind it?
    • impact of industry on challenges in becoming data driven
    • profile of companies that you are trying to work with
    • common mistakes when designing data platform
    • misconceptions that the business has around how to invest in data
    • challenges in attracting/interviewing/hiring data talent
    • What are the core components that you have standardized on for building the foundational layers of the data platform?
    • providing context and training to business users in order to allow them to self-serve the answers to their questions
      • tooling/interfaces needed to allow them to ask and investigate questions
      • most high impact areas for data engineers to focus on in the initial stages of implementing the data platform
      • how to identify and prioritize areas of effort
      • useful structure of data team at different stages of maturity
      • What are the most interesting, unexpected, or challenging lessons that you have learned while building out the business and team of 5xData?
      • What do you have planned for the future of the business?
      • What are the industry trends or specific technologies that you are keeping a close watch on?
      • Contact Info
        • LinkedIn
        • @tarush on Twitter
        • Parting Question
          • From your perspective, what is the biggest gap in the tooling or technology for data management today?
          • Closing Announcements
            • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
            • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
            • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
            • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
            • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
            • Links
              • 5xData
              • Looker
                • Podcast Episode
                • Snowflake
                  • Podcast Episode
                  • Fivetran
                    • Podcast Episode
                    • DBT
                      • Podcast Episode
                      • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                        Support Data Engineering Podcast

                        ...more
                        View all episodesView all episodes
                        Download on the App Store

                        Data Engineering PodcastBy Tobias Macey

                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5
                        • 4.5

                        4.5

                        142 ratings


                        More shows like Data Engineering Podcast

                        View all
                        The Changelog: Software Development, Open Source by Changelog Media

                        The Changelog: Software Development, Open Source

                        289 Listeners

                        Software Engineering Daily by Software Engineering Daily

                        Software Engineering Daily

                        623 Listeners

                        Talk Python To Me by Michael Kennedy

                        Talk Python To Me

                        583 Listeners

                        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                        Super Data Science: ML & AI Podcast with Jon Krohn

                        302 Listeners

                        NVIDIA AI Podcast by NVIDIA

                        NVIDIA AI Podcast

                        334 Listeners

                        Practical AI by Practical AI LLC

                        Practical AI

                        203 Listeners

                        AWS Podcast by Amazon Web Services

                        AWS Podcast

                        205 Listeners

                        Last Week in AI by Skynet Today

                        Last Week in AI

                        305 Listeners

                        Dwarkesh Podcast by Dwarkesh Patel

                        Dwarkesh Podcast

                        517 Listeners

                        The Data Engineering Show by The Firebolt Data Bros

                        The Data Engineering Show

                        8 Listeners

                        No Priors: Artificial Intelligence | Technology | Startups by Conviction

                        No Priors: Artificial Intelligence | Technology | Startups

                        130 Listeners

                        Latent Space: The AI Engineer Podcast by swyx + Alessio

                        Latent Space: The AI Engineer Podcast

                        92 Listeners

                        This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                        This Day in AI Podcast

                        228 Listeners

                        The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                        The AI Daily Brief: Artificial Intelligence News and Analysis

                        631 Listeners

                        AI + a16z by a16z

                        AI + a16z

                        36 Listeners