Data Engineering Podcast

Evolving And Scaling The Data Platform at Yotpo


Listen Later

Summary

Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their needs are being met. Yotpo has been on a journey to evolve and scale their data platform to continue serving the needs of their organization as it increases the scale and sophistication of data usage. In this episode Doron Porat and Liran Yogev explain how they arrived at their current architecture, the capabilities that they are optimizing for, and the complex process of identifying and evaluating new components to integrate into their systems. This is an excellent exploration of the decisions and tradeoffs that need to be made while building such a complex system.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl
  • RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
  • The most important piece of any data project is the data itself, which is why it is critical that your data source is high quality. PostHog is your all-in-one product analytics suite including product analysis, user funnels, feature flags, experimentation, and it’s open source so you can host it yourself or let them do it for you! You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms. Give it a try today with their generous free tier at dataengineeringpodcast.com/posthog
  • Your host is Tobias Macey and today I’m interviewing Doron Porat and Liran Yogev about their experiences designing and implementing a self-serve data platform at Yotpo
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what Yotpo is and the role that data plays in the organization?
    • What are the core data types and sources that you are working with?
      • What kinds of data assets are being produced and how do those get consumed and re-integrated into the business?
      • What are the user personas that you are supporting and what are the interfaces that they are comfortable interacting with?
        • What is the size of your team and how is it structured?
        • You recently posted about the current architecture of your data platform. What was the starting point on your platform journey?
          • What did the early stages of feature and platform evolution look like?
          • What was the catalyst for making a concerted effort to integrate your systems into a cohesive platform?
          • What was the scope and directive of the project for building a platform?
            • What are the metrics and capabilities that you are optimizing for in the structure of your data platform?
            • What are the organizational or regulatory constraints that you needed to account for?
            • What are some of the early decisions that affected your available choices in later stages of the project?
            • What does the current state of your architecture look like?
              • How long did it take to get to where you are today?
              • What were the factors that you considered in the various build vs. buy decisions?
                • How did you manage cost modeling to understand the true savings on either side of that decision?
                • If you were to start from scratch on a new data platform today what might you do differently?
                • What are the decisions that proved helpful in the later stages of your platform development?
                • What are the most interesting, innovative, or unexpected ways that you have seen your platform used?
                • What are the most interesting, unexpected, or challenging lessons that you have learned while working on designing and implementing your platform?
                • What do you have planned for the future of your platform infrastructure?
                • Contact Info
                  • Doron
                    • LinkedIn
                    • Liran
                      • LinkedIn
                      • Parting Question
                        • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                        • Closing Announcements
                          • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                          • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                          • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                          • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                          • Links
                            • Yotpo
                              • Data Platform Architecture Blog Post
                              • Greenplum
                              • Databricks
                              • Metorikku
                              • Apache Hive
                              • CDC == Change Data Capture
                              • Debezium
                                • Podcast Episode
                                • Apache Hudi
                                  • Podcast Episode
                                  • Upsolver
                                    • Podcast Episode
                                    • Spark
                                    • PrestoDB
                                    • Snowflake
                                      • Podcast Episode
                                      • Druid
                                      • Rockset
                                        • Podcast Episode
                                        • dbt
                                          • Podcast Episode
                                          • Acryl
                                            • Podcast Episode
                                            • Atlan
                                              • Podcast Episode
                                              • OpenLineage
                                                • Podcast Episode
                                                • Okera
                                                • Shopify Data Warehouse Episode
                                                • Redshift
                                                • Delta Lake
                                                  • Podcast Episode
                                                  • Iceberg
                                                    • Podcast Episode
                                                    • Outbox Pattern
                                                    • Backstage
                                                    • Roadie
                                                    • Nomad
                                                    • Kubernetes
                                                    • Deequ
                                                    • Great Expectations
                                                      • Podcast Episode
                                                      • LakeFS
                                                        • Podcast Episode
                                                        • 2021 Recap Episode
                                                        • Monte Carlo
                                                        • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                                                          ...more
                                                          View all episodesView all episodes
                                                          Download on the App Store

                                                          Data Engineering PodcastBy Tobias Macey

                                                          • 4.6
                                                          • 4.6
                                                          • 4.6
                                                          • 4.6
                                                          • 4.6

                                                          4.6

                                                          135 ratings


                                                          More shows like Data Engineering Podcast

                                                          View all
                                                          Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                                                          Software Engineering Radio - the podcast for professional software developers

                                                          272 Listeners

                                                          The Changelog: Software Development, Open Source by Changelog Media

                                                          The Changelog: Software Development, Open Source

                                                          283 Listeners

                                                          The Cloudcast by Massive Studios

                                                          The Cloudcast

                                                          152 Listeners

                                                          Thoughtworks Technology Podcast by Thoughtworks

                                                          Thoughtworks Technology Podcast

                                                          42 Listeners

                                                          Data Skeptic by Kyle Polich

                                                          Data Skeptic

                                                          481 Listeners

                                                          Talk Python To Me by Michael Kennedy

                                                          Talk Python To Me

                                                          590 Listeners

                                                          Software Engineering Daily by Software Engineering Daily

                                                          Software Engineering Daily

                                                          625 Listeners

                                                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                                          The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                                          441 Listeners

                                                          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                                          Super Data Science: ML & AI Podcast with Jon Krohn

                                                          298 Listeners

                                                          Python Bytes by Michael Kennedy and Brian Okken

                                                          Python Bytes

                                                          213 Listeners

                                                          DataFramed by DataCamp

                                                          DataFramed

                                                          265 Listeners

                                                          Practical AI by Practical AI LLC

                                                          Practical AI

                                                          190 Listeners

                                                          The Stack Overflow Podcast by The Stack Overflow Podcast

                                                          The Stack Overflow Podcast

                                                          64 Listeners

                                                          The Real Python Podcast by Real Python

                                                          The Real Python Podcast

                                                          140 Listeners

                                                          Latent Space: The AI Engineer Podcast by swyx + Alessio

                                                          Latent Space: The AI Engineer Podcast

                                                          76 Listeners