Spatial Stack with Matt Forrest

#38: How Apache Sedona Solved Big Data’s Hardest Problem with Jia Yu


Listen Later

Large Language Models can write poetry and debug code, but they still don't understand the fundamental physics of the real world. Ask an AI to find the "nearest restaurant" to a specific coordinate, and it struggles because it lacks Spatial Intelligence.

In this episode, we sit down with Jia Yu, the co-creator of Apache Sedona and co-founder of Wherobots, to discuss why geospatial data breaks standard big data engines and how he built the solution that now powers over 2 million downloads a month.

We trace the 10-year journey from a PhD research paper to a top-level Apache project, diving into the deep technical challenges of distributed computing. Jia explains why spatial data requires a completely different architecture than standard text or numbers and how the industry is finally moving toward a "Spatial Lakehouse" to break down data silos.

In this episode, we explore:

- The "Multimodality" Trap: Why mixing vector, raster, and LiDAR data crashes traditional systems.

- How SedonaDB is bringing massive scale to single-node machines (so you don't always need a cluster).

- The hardest problem in distributed computing - How to split a map across 1,000 servers without breaking the data.

- The multi-year fight to get native geometry support into Apache Iceberg.

- Why the next generation of models must evolve from text-based to spatially intelligent.

✅ Sign Up for Wherobots: https://wherobots.com/
✅ Learn more about Apache Sedona: https://wherobots.com/apache-sedona/
✅ What is Apache Sedona: https://wherobots.com/blog/what-is-apache-sedona/
✅ Test out SedonaDB: https://sedona.apache.org/sedonadb/latest/
✅ Connect with Jia on LinkedIn: https://www.linkedin.com/in/dr-jia-yu/ 

00:00:00 - Intro & Welcome 
00:00:51 - The Origin Story: From GeoSpark to Apache Sedona 
00:06:03 - Why Geospatial Data is "Special" (The Multimodality Problem) 
00:09:47 - When to Move to Distributed Computing? 
00:13:21 - The Secret to Maintaining a Vibrant Open Source Community
00:18:11 - The Features That Drove Adoption: Spatial SQL & Python 
00:22:35 - Deep Dive: How Spatial Partitioning Works 
00:28:57 - Why Build a Cloud-Native Platform? 
00:33:05 - The Rise of the Spatial Lakehouse & Apache Iceberg 
00:40:17 - Introducing SedonaDB: A Single-Node Engine 
00:45:10 - The Future: Why AI Needs Spatial Intelligence 
00:48:44 - Advice for Getting Started with Spatial Engineering

📰 Daily modern GIS insights: https://forrest.nyc

CONNECT WITH ME
📸 Instagram:  https://www.instagram.com/matt_forrest/
💼 LinkedIn: https://www.linkedin.com/in/mbforr/
📧 Newsletter: https://forrest.nyc
🌐 Website: https://forrest.nyc

...more
View all episodesView all episodes
Download on the App Store

Spatial Stack with Matt ForrestBy Matt Forrest

  • 5
  • 5
  • 5
  • 5
  • 5

5

4 ratings


More shows like Spatial Stack with Matt Forrest

View all
NPR News Now by NPR

NPR News Now

14,615 Listeners

The Diary Of A CEO with Steven Bartlett by DOAC

The Diary Of A CEO with Steven Bartlett

8,937 Listeners

The MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography by MapScaping

The MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography

115 Listeners