Data Engineering Podcast

Distributed In Memory Processing And Streaming With Hazelcast


Listen Later

Summary

In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the Hazelcast team has also built a streaming platform for reliable high throughput data transmission. In this episode Dale Kim shares how Hazelcast is implemented, the use cases that it enables, and how it complements on-disk data management systems.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Tree Schema is a data catalog that is making metadata management accessible to everyone. With Tree Schema you can create your data catalog and have it fully populated in under five minutes when using one of the many automated adapters that can connect directly to your data stores. Tree Schema includes essential cataloging features such as first class support for both tabular and unstructured data, data lineage, rich text documentation, asset tagging and more. Built from the ground up with a focus on the intersection of people and data, your entire team will find it easier to foster collaboration around your data. With the most transparent pricing in the industry – $99/mo for your entire company – and a money-back guarantee for excellent service, you’ll love Tree Schema as much as you love your data. Go to dataengineeringpodcast.com/treeschema today to get your first month free, and mention this podcast to get %50 off your first three months after the trial.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host is Tobias Macey and today I’m interviewing Dale Kim about Hazelcast, a distributed in-memory computing platform for data intensive applications
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by describing what Hazelcast is and its origins?
    • What are the benefits and tradeoffs of in-memory computation for data-intensive workloads?
    • What are some of the common use cases for the Hazelcast in memory grid?
    • How is Hazelcast implemented?
      • How has the architecture evolved since it was first created?
      • How is the Jet streaming framework architected?
        • What was the motivation for building it?
        • How do the capabilities of Jet compare to systems such as Flink or Spark Streaming?
        • How has the introduction of hardware capabilities such as NVMe drives influenced the market for in-memory systems?
        • How is the governance of the open source grid and Jet projects handled?
          • What is the guiding heuristic for which capabilities or features to include in the open source projects vs. the commercial offerings?
          • What is involved in building an application or workflow on top of Hazelcast?
          • What are the common patterns for engineers who are building on top of Hazelcast?
          • What is involved in deploying and maintaining an installation of the Hazelcast grid or Jet streaming?
          • What are the scaling factors for Hazelcast?
            • What are the edge cases that users should be aware of?
            • What are some of the most interesting, innovative, or unexpected ways that you have seen Hazelcast used?
            • When is Hazelcast Grid or Jet the wrong choice?
            • What is in store for the future of Hazelcast?
            • Contact Info
              • LinkedIn
              • Parting Question
                • From your perspective, what is the biggest gap in the tooling or technology for data management today?
                • Closing Announcements
                  • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
                  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
                  • Links
                    • HazelCast
                    • Istanbul
                    • Apache Spark
                    • OrientDB
                    • CAP Theorem
                    • NVMe
                    • Memristors
                    • Intel Optane Persistent Memory
                    • Hazelcast Jet
                    • Kappa Architecture
                    • IBM Cloud Paks
                    • Digital Integration Hub (Gartner)
                    • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                      Support Data Engineering Podcast

                      ...more
                      View all episodesView all episodes
                      Download on the App Store

                      Data Engineering PodcastBy Tobias Macey

                      • 4.5
                      • 4.5
                      • 4.5
                      • 4.5
                      • 4.5

                      4.5

                      142 ratings


                      More shows like Data Engineering Podcast

                      View all
                      The Changelog: Software Development, Open Source by Changelog Media

                      The Changelog: Software Development, Open Source

                      290 Listeners

                      Software Engineering Daily by Software Engineering Daily

                      Software Engineering Daily

                      623 Listeners

                      Talk Python To Me by Michael Kennedy

                      Talk Python To Me

                      584 Listeners

                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                      Super Data Science: ML & AI Podcast with Jon Krohn

                      302 Listeners

                      NVIDIA AI Podcast by NVIDIA

                      NVIDIA AI Podcast

                      333 Listeners

                      Practical AI by Practical AI LLC

                      Practical AI

                      204 Listeners

                      AWS Podcast by Amazon Web Services

                      AWS Podcast

                      205 Listeners

                      Last Week in AI by Skynet Today

                      Last Week in AI

                      306 Listeners

                      Dwarkesh Podcast by Dwarkesh Patel

                      Dwarkesh Podcast

                      517 Listeners

                      The Data Engineering Show by The Firebolt Data Bros

                      The Data Engineering Show

                      8 Listeners

                      No Priors: Artificial Intelligence | Technology | Startups by Conviction

                      No Priors: Artificial Intelligence | Technology | Startups

                      130 Listeners

                      Latent Space: The AI Engineer Podcast by swyx + Alessio

                      Latent Space: The AI Engineer Podcast

                      92 Listeners

                      This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                      This Day in AI Podcast

                      228 Listeners

                      The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                      The AI Daily Brief: Artificial Intelligence News and Analysis

                      630 Listeners

                      AI + a16z by a16z

                      AI + a16z

                      36 Listeners