Data Engineering Podcast

Build More Reliable Distributed Systems By Breaking Them With Jepsen


Listen Later

Summary

A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to understand their limitations. This was a great look at some of the underlying principles that power your mission critical workloads.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • If you’ve been exploring scalable, cost-effective and secure ways to collect and route data across your organization, RudderStack is the only solution that helps you turn your own warehouse into a state of the art customer data platform. Their mission is to empower data engineers to fully own their customer data infrastructure and easily push value to other parts of the organization, like marketing and product management. With their open-source foundation, fixed pricing, and unlimited volume, they are enterprise ready, but accessible to everyone. Go to dataengineeringpodcast.com/rudder to request a demo and get one free month of access to the hosted platform along with a free t-shirt.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host is Tobias Macey and today I’m interviewing Kyle Kingsbury about his work on the Jepsen testing framework and the failure modes of distributed systems
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by describing what the Jepsen project is?
      • What was your inspiration for starting the project?
      • What other methods are available for evaluating and stress testing distributed systems?
      • What are some of the common misconceptions or misunderstanding of distributed systems guarantees and how they impact real world usage of things like databases?
      • How do you approach the design of a test suite for a new distributed system?
        • What is your heuristic for determining the completeness of your test suite?
        • What are some of the common challenges of setting up a representative deployment for testing?
        • Can you walk through the workflow of setting up, running, and evaluating the output of a Jepsen test?
        • How is Jepsen implemented?
          • How has the design evolved since you first began working on it?
          • What are the pros and cons of using Clojure for building Jepsen?
          • If you were to start over today on the Jepsen framework what would you do differently?
          • What are some of the most common failure modes that you have identified in the platforms that you have tested?
          • What have you found to be the most difficult to resolve distributed systems bugs?
          • What are some of the interesting developments in distributed systems design that you are keeping an eye on?
          • How do you perceive the impact that Jepsen has had on modern distributed systems products?
          • What have you found to be the most interesting, unexpected, or challenging lessons learned while building Jepsen and evaluating mission critical systems?
          • What do you have planned for the future of the Jepsen framework?
          • Contact Info
            • aphyr on GitHub
            • Website
            • Parting Question
              • From your perspective, what is the biggest gap in the tooling or technology for data management today?
              • Closing Announcements
                • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
                • Links
                  • Jepsen
                  • Riak
                  • Distributed Systems
                  • TLA+
                  • Coq
                  • Isabelle
                  • Cassandra DTest
                  • FoundationDB
                    • Podcast Episode
                    • CRDT == Conflict-free Replicated Data-type
                      • Podcast Episode
                      • Riemann
                      • Clojure
                      • JVM == Java Virtual Machine
                      • Kotlin
                      • Haskell
                      • Scala
                      • Groovy
                      • TiDB
                      • YugabyteDB
                        • Podcast Episode
                        • CockroachDB
                          • Podcast Episode
                          • Raft consensus algorithm
                          • Paxos
                          • Leslie Lamport
                          • Calvin
                          • FaunaDB
                            • Podcast Episode
                            • Heidi Howard
                            • CALM Conjecture
                            • Causal Consistency
                            • Hillel Wayne
                            • Christopher Meiklejohn
                            • Distsys Class
                            • Distributed Systems For Fun And Profit by
                            • Mikito Takada
                            • Christopher Meiklejohn Reading List
                            • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                              Support Data Engineering Podcast

                              ...more
                              View all episodesView all episodes
                              Download on the App Store

                              Data Engineering PodcastBy Tobias Macey

                              • 4.6
                              • 4.6
                              • 4.6
                              • 4.6
                              • 4.6

                              4.6

                              135 ratings


                              More shows like Data Engineering Podcast

                              View all
                              Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                              Software Engineering Radio - the podcast for professional software developers

                              272 Listeners

                              The Changelog: Software Development, Open Source by Changelog Media

                              The Changelog: Software Development, Open Source

                              283 Listeners

                              The Cloudcast by Massive Studios

                              The Cloudcast

                              152 Listeners

                              Thoughtworks Technology Podcast by Thoughtworks

                              Thoughtworks Technology Podcast

                              41 Listeners

                              Data Skeptic by Kyle Polich

                              Data Skeptic

                              482 Listeners

                              Talk Python To Me by Michael Kennedy

                              Talk Python To Me

                              592 Listeners

                              Software Engineering Daily by Software Engineering Daily

                              Software Engineering Daily

                              625 Listeners

                              The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                              The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                              443 Listeners

                              Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                              Super Data Science: ML & AI Podcast with Jon Krohn

                              296 Listeners

                              Python Bytes by Michael Kennedy and Brian Okken

                              Python Bytes

                              213 Listeners

                              DataFramed by DataCamp

                              DataFramed

                              266 Listeners

                              Practical AI by Practical AI LLC

                              Practical AI

                              189 Listeners

                              The Stack Overflow Podcast by The Stack Overflow Podcast

                              The Stack Overflow Podcast

                              64 Listeners

                              The Real Python Podcast by Real Python

                              The Real Python Podcast

                              140 Listeners

                              Latent Space: The AI Engineer Podcast by swyx + Alessio

                              Latent Space: The AI Engineer Podcast

                              77 Listeners