Data Engineering Podcast

Build Your Own End To End Customer Data Platform With Rudderstack


Listen Later

Summary

Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even more complex. To simplify the work of managing the full flow of your customer data and keep you in full control the team at Rudderstack created their eponymous open source platform that allows you to work with first and third party data, as well as build and manage reverse ETL workflows. In this episode CEO and founder Soumyadeb Mitra explains how Rudderstack compares to the various other tools and platforms that share some overlap, how to set it up for your own data needs, and how it is architected to scale to meet demand.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy.
  • The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses.
  • Your host is Tobias Macey and today I’m interviewing Soumyadeb Mitra about his experience as the founder of Rudderstack and its role in your data platform
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what Rudderstack is and the story behind it?
    • What are the main use cases that Rudderstack is designed to support?
    • Who are the target users of Rudderstack?
      • How does the availability of the managed cloud service change the user profiles that you can target?
      • How do these user profiles influence your focus and prioritization of features and user experience?
      • How would you characterize the position of Rudderstack in the current data ecosystem?
        • What other tools/systems might you replace with Rudderstack?
        • How do you think about the application of Rudderstack compared to tools for data integration (e.g. Singer, Stitch, Fivetran) and reverse ETL (e.g. Grouparoo, Hightouch, Census)?
        • Can you describe how the Rudderstack platform is designed and implemented?
          • How have the goals/design/use cases of Rudderstack changed or evolved since you first started working on it?
          • What are the different extension points available for engineers to extend and customize Rudderstack?
          • Working with customer data is a core capability in Rudderstack. How do you manage the identity resolution of users as they transition back and forth between anonymous and identified?
            • What are some of the data privacy primitives that you include to assist with data security/regulatory concerns?
            • What is the process of getting started with Rudderstack as a software or data platform engineer?
            • What are some of the operational challenges related to running your own deployment of Rudderstack?
            • What are some of the overlooked/underemphasized capabilities of Rudderstack?
            • How have you approached the governance model/boundaries between OSS and commercial for Rudderstack?
            • What are the most interesting, innovative, or unexpected ways that you have seen Rudderstack used?
            • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Rudderstack?
            • When is Rudderstack the wrong choice?
            • What do you have planned for the future of Rudderstack?
            • Contact Info
              • LinkedIn
              • @soumyadeb_mitra on Twitter
              • Parting Question
                • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                • Closing Announcements
                  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                  • Links
                    • Rudderstack
                    • Hadoop
                    • Spark
                    • Segment
                      • Podcast Episode
                      • Grouparoo
                        • Podcast Episode
                        • Fivetran
                          • Podcast Episode
                          • Stitch
                          • Singer
                            • Podcast Episode
                            • Census
                              • Podcast Episode
                              • Hightouch
                                • Podcast Episode
                                • LiveRamp
                                • Airbyte
                                  • Podcast Episode
                                  • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                                    Support Data Engineering Podcast

                                    ...more
                                    View all episodesView all episodes
                                    Download on the App Store

                                    Data Engineering PodcastBy Tobias Macey

                                    • 4.6
                                    • 4.6
                                    • 4.6
                                    • 4.6
                                    • 4.6

                                    4.6

                                    135 ratings


                                    More shows like Data Engineering Podcast

                                    View all
                                    Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                                    Software Engineering Radio - the podcast for professional software developers

                                    272 Listeners

                                    The Changelog: Software Development, Open Source by Changelog Media

                                    The Changelog: Software Development, Open Source

                                    282 Listeners

                                    The Cloudcast by Massive Studios

                                    The Cloudcast

                                    152 Listeners

                                    Thoughtworks Technology Podcast by Thoughtworks

                                    Thoughtworks Technology Podcast

                                    42 Listeners

                                    Data Skeptic by Kyle Polich

                                    Data Skeptic

                                    481 Listeners

                                    Talk Python To Me by Michael Kennedy

                                    Talk Python To Me

                                    591 Listeners

                                    Software Engineering Daily by Software Engineering Daily

                                    Software Engineering Daily

                                    627 Listeners

                                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                    440 Listeners

                                    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                    Super Data Science: ML & AI Podcast with Jon Krohn

                                    298 Listeners

                                    Python Bytes by Michael Kennedy and Brian Okken

                                    Python Bytes

                                    213 Listeners

                                    DataFramed by DataCamp

                                    DataFramed

                                    265 Listeners

                                    Practical AI by Practical AI LLC

                                    Practical AI

                                    189 Listeners

                                    The Stack Overflow Podcast by The Stack Overflow Podcast

                                    The Stack Overflow Podcast

                                    64 Listeners

                                    The Real Python Podcast by Real Python

                                    The Real Python Podcast

                                    140 Listeners

                                    Latent Space: The AI Engineer Podcast by swyx + Alessio

                                    Latent Space: The AI Engineer Podcast

                                    76 Listeners