The GeekNarrator

Many Databases 1 LSM Engine - OpenData


Listen Later

The episode explores why modern databases keep reinventing the same distributed-systems machinery and argues that a major part of database cost is the operational tax of running replication-heavy systems. Our guest, Almog Gavra, co-founder of Responsive, explains how his team pivoted from operating Kafka Streams as a service to building SlateDB and the “Open Data” manifesto: an object-storage-native LSM foundation that can power multiple database types (vector, time series, logs, key-value) with shared tuning knobs and failure modes. They discuss why distributed-systems complexity is often harder than query engines, how LSM trees provide a tunable tradeoff between read/write/space amplification, caching layers and cost transparency, separating readers/writers, stateless ingest, single-writer availability and fencing via S3 compare-and-set, offloading compaction, and how the architecture enables near-free snapshots. They also cover when this approach doesn’t fit: OLTP that can stay on Postgres and ultra-low-latency workloads where cold object-store misses are unacceptable.Chapters:00:00 Introduction08:36 Open Data Manifesto18:34 Specialized vs General25:10 SlateDB Architecture32:51 LSM Trees as Tuning Dial38:58 Tuning Without Overload39:46 Cost Aware Config Knobs41:51 Latency Cost Durability Tradeoffs46:46 Caching Strategies And Layers50:23 Split Readers And Writers52:43 Single Writer Versus Multi Writer55:16 Scaling And Partitioning Writes58:58 Failure Modes And Fencing01:05:23 Compaction As Separate Worker01:09:28 Snapshots And Garbage Collection01:10:25 When Open Data Is Not FitImportant links and references:OpenData: http://github.com/opendata-oss/opendataOpenData manifesto: https://www.opendata.dev/blog/manifestoReach out to Almog: https://www.linkedin.com/in/agavra/ or https://x.com/almoggavraDostovesky paper on LSM: https://nivdayan.github.io/dostoevsky.pdfLatency/Cost/Durability Triad: https://materializedview.io/p/cloud-storage-triad-latency-cost-durabilitySlateDB: https://github.com/slatedb/slatedb"how SSTs work": https://www.bitsxpages.com/p/sorted-string-tables-sst-from-firstFor memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinDon't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!

...more
View all episodesView all episodes
Download on the App Store

The GeekNarratorBy Kaivalya Apte

  • 5
  • 5
  • 5
  • 5
  • 5

5

3 ratings