
Sign up to save your podcasts
Or


AI is moving fast. But the data foundations underneath it are not. That is the core theme of my latest interview with Rajan Goyal, Founder and CEO of DataPelago on The Ravit Show, recorded at their office in Mountain View.
Here is what we unpacked --
Why now
- Three shifts are colliding at once. Hardware acceleration is now mainstream. Generative AI has changed how data is created and consumed. Data complexity has exploded beyond what existing systems were designed to handle. The result is clear. Enterprises do not just need faster systems. They need a more unified data foundation.
Where the tension is
- AI models are advancing quickly, from multimodal systems to agents and domain-specific LLMs. But the data infrastructure beneath them is still built for an analytics-first world. Most companies spend more time moving data between systems than actually innovating with it.
What DataPelago is building
- We spent time breaking down DataPelago Nucleus, described as the world’s first universal data processing engine. One engine that can handle batch, streaming, relational, vector, and tensor workloads together. The key idea is simple but powerful. Ingest, transform, and query data without constantly moving it across systems.
We also talked about what makes their approach different.
- A DataOS layer that intelligently maps workloads across CPUs, GPUs, and other accelerators.
A DataApp layer that plugs into engines like Spark and Trino.
And DataVM, a data-focused virtual machine that unifies execution across heterogeneous hardware.
Why Spark acceleration matters
- For teams running Spark today, we discussed the DataPelago Accelerator for Spark. It runs existing Spark workloads on accelerated compute with zero code changes. Faster joins, shuffles, preprocessing, and lower cost, without rewriting pipelines.
Why today’s stack is breaking
- Warehouses, lakes, and lakehouses were built for SQL analytics. AI workloads need tight coupling between data and compute. The separation we see today leads to redundant pipelines, silos, and expensive data movement. Many teams are forced to optimize for analytics or AI, but not both.
Why DataPelago was founded and what customers see
- The founding insight was clear. Data systems were never designed for AI-scale throughput. Customers adopting this approach are unifying analytics and AI pipelines on one platform, simplifying infrastructure while improving performance, governance, and observability. Rajan made an interesting comparison. This shift for data processing is similar to what GPUs did for compute.
What’s next
- We closed by talking about how the data and AI relationship will evolve over the next few years, and what this looks like in real-world deployments. That is what the next episode will dive into.
If you are building AI systems and still relying on analytics-era data foundations, this one is worth your time.
#data #ai #gpu #datapelago #lakehouse #sql #analytics #theravitshow
By Ravit Jain5
11 ratings
AI is moving fast. But the data foundations underneath it are not. That is the core theme of my latest interview with Rajan Goyal, Founder and CEO of DataPelago on The Ravit Show, recorded at their office in Mountain View.
Here is what we unpacked --
Why now
- Three shifts are colliding at once. Hardware acceleration is now mainstream. Generative AI has changed how data is created and consumed. Data complexity has exploded beyond what existing systems were designed to handle. The result is clear. Enterprises do not just need faster systems. They need a more unified data foundation.
Where the tension is
- AI models are advancing quickly, from multimodal systems to agents and domain-specific LLMs. But the data infrastructure beneath them is still built for an analytics-first world. Most companies spend more time moving data between systems than actually innovating with it.
What DataPelago is building
- We spent time breaking down DataPelago Nucleus, described as the world’s first universal data processing engine. One engine that can handle batch, streaming, relational, vector, and tensor workloads together. The key idea is simple but powerful. Ingest, transform, and query data without constantly moving it across systems.
We also talked about what makes their approach different.
- A DataOS layer that intelligently maps workloads across CPUs, GPUs, and other accelerators.
A DataApp layer that plugs into engines like Spark and Trino.
And DataVM, a data-focused virtual machine that unifies execution across heterogeneous hardware.
Why Spark acceleration matters
- For teams running Spark today, we discussed the DataPelago Accelerator for Spark. It runs existing Spark workloads on accelerated compute with zero code changes. Faster joins, shuffles, preprocessing, and lower cost, without rewriting pipelines.
Why today’s stack is breaking
- Warehouses, lakes, and lakehouses were built for SQL analytics. AI workloads need tight coupling between data and compute. The separation we see today leads to redundant pipelines, silos, and expensive data movement. Many teams are forced to optimize for analytics or AI, but not both.
Why DataPelago was founded and what customers see
- The founding insight was clear. Data systems were never designed for AI-scale throughput. Customers adopting this approach are unifying analytics and AI pipelines on one platform, simplifying infrastructure while improving performance, governance, and observability. Rajan made an interesting comparison. This shift for data processing is similar to what GPUs did for compute.
What’s next
- We closed by talking about how the data and AI relationship will evolve over the next few years, and what this looks like in real-world deployments. That is what the next episode will dive into.
If you are building AI systems and still relying on analytics-era data foundations, this one is worth your time.
#data #ai #gpu #datapelago #lakehouse #sql #analytics #theravitshow