Data Engineering Podcast

Building A Better Data Warehouse For The Cloud At Firebolt


Listen Later

Summary

Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage. Firebolt is taking that a step further with a core focus on speed and interactivity. In this episode CEO and founder Eldad Farkash explains how the Firebolt platform is architected for high throughput, their simple and transparent pricing model to encourage widespread use, and the use cases that it unlocks through interactive query speeds.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company.
  • Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt.
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host is Tobias Macey and today I’m interviewing Eldad Farkash about Firebolt, a cloud data warehouse optimized for speed and elasticity on structured and semi-structured data
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you start by describing what Firebolt is and your motivation for building it?
    • How does Firebolt compare to other data warehouse technologies what unique features does it provide?
    • The lines between a data warehouse and a data lake have been blurring in recent years. Where on that continuum does Firebolt lie?
    • What are the unique use cases that Firebolt allows for?
    • How do the performance characteristics of Firebolt change the ways that an engineer should think about data modeling?
    • What technologies might someone replace with Firebolt?
    • How is Firebolt architected and how has the design evolved since you first began working on it?
    • What are some of the most challenging aspects of building a data warehouse platform that is optimized for speed?
    • How do you handle support for nested and semi-structured data?
    • In what ways have you found it necessary/useful to extend SQL?
    • Due to the immutability of object storage, for data lakes the update or delete process involves reprocessing a potentially large amount of data. How do you approach that in Firebolt with your F3 format?
    • What have you found to be the most interesting, unexpected, or challenging lessons while building and scaling the Firebolt platform and business?
    • When is Firebolt the wrong choice?
    • What do you have planned for the future of Firebolt?
    • Contact Info
      • LinkedIn
      • Parting Question
        • From your perspective, what is the biggest gap in the tooling or technology for data management today?
        • Closing Announcements
          • Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
          • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
          • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
          • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
          • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
          • Links
            • Firebolt
            • Sisense
            • SnowflakeDB
              • Podcast Episode
              • Redshift
              • Spark
                • Podcast Episode
                • Parquet
                  • Podcast Episode
                  • Hadoop
                  • HDFS
                  • S3
                  • AWS Athena
                  • BigQuery
                  • Data Vault
                    • Podcast Episode
                    • Star Schema
                    • Dimensional Modeling
                    • Slowly Changing Dimensions
                    • JDBC
                    • TPC Benchmarks
                    • DBT
                      • Podcast Episode
                      • Tableau
                      • Looker
                        • Podcast Episode
                        • PrestoSQL
                          • Podcast Episode
                          • PostgreSQL
                            • Podcast Episode
                            • FoundationDB
                              • Podcast Episode
                              • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                                Support Data Engineering Podcast

                                ...more
                                View all episodesView all episodes
                                Download on the App Store

                                Data Engineering PodcastBy Tobias Macey

                                • 4.6
                                • 4.6
                                • 4.6
                                • 4.6
                                • 4.6

                                4.6

                                135 ratings


                                More shows like Data Engineering Podcast

                                View all
                                Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                                Software Engineering Radio - the podcast for professional software developers

                                272 Listeners

                                The Changelog: Software Development, Open Source by Changelog Media

                                The Changelog: Software Development, Open Source

                                283 Listeners

                                The Cloudcast by Massive Studios

                                The Cloudcast

                                152 Listeners

                                Thoughtworks Technology Podcast by Thoughtworks

                                Thoughtworks Technology Podcast

                                41 Listeners

                                Data Skeptic by Kyle Polich

                                Data Skeptic

                                482 Listeners

                                Talk Python To Me by Michael Kennedy

                                Talk Python To Me

                                592 Listeners

                                Software Engineering Daily by Software Engineering Daily

                                Software Engineering Daily

                                625 Listeners

                                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                443 Listeners

                                Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                Super Data Science: ML & AI Podcast with Jon Krohn

                                296 Listeners

                                Python Bytes by Michael Kennedy and Brian Okken

                                Python Bytes

                                213 Listeners

                                DataFramed by DataCamp

                                DataFramed

                                266 Listeners

                                Practical AI by Practical AI LLC

                                Practical AI

                                189 Listeners

                                The Stack Overflow Podcast by The Stack Overflow Podcast

                                The Stack Overflow Podcast

                                64 Listeners

                                The Real Python Podcast by Real Python

                                The Real Python Podcast

                                140 Listeners

                                Latent Space: The AI Engineer Podcast by swyx + Alessio

                                Latent Space: The AI Engineer Podcast

                                77 Listeners