Deep Dive into DuckDB with CTO Mark Raasveldt
Decode the insights of databases with Geek Narrator podcast. In this episode, host Kaivalya Apte converses with Mark Raasveldt, the CTO of DuckDB labs, discussing his journey from being a database enthusiast to creating DuckDB. They delve into how DuckDB, an analytical database, differs from other databases, the design decisions, its internal mechanisms, and much more. The episode also highlights the advantages of DuckDB in analytics, the motivation behind its ACID compliance, and how DuckDB handles ingestion, transaction isolation, mutations, and queries. Join in to learn how your data workloads can benefit from DuckDB.
00:00 Introduction and Guest Introduction
00:44 Guest's Journey into Databases
03:40 The Birth of DuckDB
04:30 Challenges with Existing Databases
05:15 Technical Difficulties
05:16 Why Existing Databases Fall Short for Data Scientists
09:16 The Role of SQLite and Its Limitations
13:59 Defining DuckDB
16:48 Comparing DuckDB with Other Analytical Databases
19:50 Deployment Models for DuckDB
22:47 Data ingestion into DuckDB
22:51 Data Ingestion in DuckDB
30:24 How DuckDB Handles Updates and Mutations
35:35 Understanding Column Granularity and Rewrites
35:58 Implications of Compression on Data Updates
36:38 Trade-offs in Row Group Size
37:32 Benefits of Column Storage Model
38:15 Row Groups and Parallelism
39:02 Choosing Row Group Size: An Experimental Approach
40:00 Handling Data Type Changes in Columns
41:00 Internal Data Structures in DuckDB
42:21 Reading Data: Point Lookups, Aggregations, and Joins
47:22 Optimization for Full Table Scans
53:49 Understanding ACID Compliance in DuckDB
55:49 Multi-Version Concurrency Control (MVCC) in DuckDB
59:50 Use Cases and Applications of DuckDB
01:01:42 The Story Behind DuckDB's Name
01:02:34 Future Vision for DuckDB
References:
DuckDB: https://duckdb.org/
Mark's blog: https://mytherin.github.io/
===============================================================================
For discount on the below courses:
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
===============================================================================
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
Cheers,