Data Engineering Podcast

Move Your Database To The Data And Speed Up Your Analytics With DuckDB


Listen Later

Summary

When you think about selecting a database engine for your project you typically consider options focused on serving multiple concurrent users. Sometimes what you really need is an embedded database that is blazing fast for single user workloads. DuckDB is an in-process database engine optimized for OLAP applications to speed up your analytical queries that meets you where you are, whether that’s Python, R, Java, even the web. In this episode, Hannes Mühleisen, co-creator and CEO of DuckDB Labs, shares the motivations for creating the project, the myriad ways that it can be used to speed up your data projects, and the detailed engineering efforts that go into making it adaptable to any environment. This is a fascinating and humorous exploration of a truly useful piece of technology.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
  • RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
  • Your host is Tobias Macey and today I’m interviewing Hannes Mühleisen about DuckDB, an in-process embedded database engine for columnar analytics
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what DuckDB is and the story behind it?
    • Where did the name come from?
    • What are some of the use cases that DuckDB is designed to support?
    • The interface for DuckDB is similar (at least in spirit) to SQLite. What are the deciding factors for when to use one vs. the other?
      • How might they be used in concert to take advantage of their relative strengths?
      • What are some of the ways that DuckDB can be used to better effect than options provided by different language ecosystems?
      • Can you describe how DuckDB is implemented?
        • How has the design and goals of the project changed or evolved since you began working on it?
        • What are some of the optimizations that you have had to make in order to support performant access to data that exceeds available memory?
        • Can you describe a typical workflow of incorporating DuckDB into an analytical project?
        • What are some of the libraries/tools/systems that DuckDB might replace in the scope of a project or team?
        • What are some of the overlooked/misunderstood/under-utilized features of DuckDB that you would like to highlight?
        • What is the governance model and plan long-term sustainability of the project?
        • What are the most interesting, innovative, or unexpected ways that you have seen DuckDB used?
        • What are the most interesting, unexpected, or challenging lessons that you have learned while working on DuckDB?
        • When is DuckDB the wrong choice?
        • What do you have planned for the future of DuckDB?
        • Contact Info
          • Hannes Mühleisen
          • @hfmuehleisen on Twitter
          • Website
          • Parting Question
            • From your perspective, what is the biggest gap in the tooling or technology for data management today?
            • Links
              • DuckDB
              • CWI
              • SQLite
              • OLAP == Online Analytical Processing
              • Duck Typing
              • ZODB
              • Teradata
              • HTAP == Hybrid Transactional/Analytical Processing
              • Pandas
                • Podcast.__init__ Episode
                • Apache Arrow
                • Julia Language
                • Voltron Data
                • Parquet
                • Thrift
                • Protobuf
                • Vectorized Query Processor
                • LLVM
                • DuckDB Labs
                • DuckDB Foundation
                • MIT Open Courseware (OCW)
                • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                  Support Data Engineering Podcast

                  ...more
                  View all episodesView all episodes
                  Download on the App Store

                  Data Engineering PodcastBy Tobias Macey

                  • 4.5
                  • 4.5
                  • 4.5
                  • 4.5
                  • 4.5

                  4.5

                  142 ratings


                  More shows like Data Engineering Podcast

                  View all
                  The Changelog: Software Development, Open Source by Changelog Media

                  The Changelog: Software Development, Open Source

                  290 Listeners

                  Software Engineering Daily by Software Engineering Daily

                  Software Engineering Daily

                  623 Listeners

                  Talk Python To Me by Michael Kennedy

                  Talk Python To Me

                  584 Listeners

                  Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                  Super Data Science: ML & AI Podcast with Jon Krohn

                  302 Listeners

                  NVIDIA AI Podcast by NVIDIA

                  NVIDIA AI Podcast

                  333 Listeners

                  Practical AI by Practical AI LLC

                  Practical AI

                  204 Listeners

                  AWS Podcast by Amazon Web Services

                  AWS Podcast

                  205 Listeners

                  Last Week in AI by Skynet Today

                  Last Week in AI

                  306 Listeners

                  Dwarkesh Podcast by Dwarkesh Patel

                  Dwarkesh Podcast

                  517 Listeners

                  The Data Engineering Show by The Firebolt Data Bros

                  The Data Engineering Show

                  8 Listeners

                  No Priors: Artificial Intelligence | Technology | Startups by Conviction

                  No Priors: Artificial Intelligence | Technology | Startups

                  130 Listeners

                  Latent Space: The AI Engineer Podcast by swyx + Alessio

                  Latent Space: The AI Engineer Podcast

                  92 Listeners

                  This Day in AI Podcast by Michael Sharkey, Chris Sharkey

                  This Day in AI Podcast

                  228 Listeners

                  The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

                  The AI Daily Brief: Artificial Intelligence News and Analysis

                  630 Listeners

                  AI + a16z by a16z

                  AI + a16z

                  36 Listeners