
Sign up to save your podcasts
Or


Diffusion models changed how we generate images and video—now they’re coming for text.
In this episode, we sit down with Stefano Ermon, Stanford computer science professor and founder of Inception Labs, to unpack how diffusion works for language, why it can generate in parallel (instead of token-by-token), and what that means for latency, cost, and real-time AI products.
We talk through:
The simplest mental model for diffusion: generate a full draft, then refine it by “fixing mistakes”
Why today’s autoregressive LLM inference is often memory-bound—and why diffusion can shift it toward a more GPU-friendly compute profile
Where Mercury wins today (IDEs, voice/real-time agents, customer support, EdTech—anywhere humans can’t wait)
What changes (and what doesn’t) for long context and architecture choices
The real-world way to evaluate models in production: offline evals + the gold-standard A/B test
Stefano also shares what’s next on Mercury’s roadmap—especially around stronger planning and reasoning for agentic use cases.
Try Mercury + learn more: inceptionlabs.ai
For more practical, grounded conversations on AI systems that actually work, subscribe to The Neuron newsletter at https://theneuron.ai.
By The Neuron4.8
6363 ratings
Diffusion models changed how we generate images and video—now they’re coming for text.
In this episode, we sit down with Stefano Ermon, Stanford computer science professor and founder of Inception Labs, to unpack how diffusion works for language, why it can generate in parallel (instead of token-by-token), and what that means for latency, cost, and real-time AI products.
We talk through:
The simplest mental model for diffusion: generate a full draft, then refine it by “fixing mistakes”
Why today’s autoregressive LLM inference is often memory-bound—and why diffusion can shift it toward a more GPU-friendly compute profile
Where Mercury wins today (IDEs, voice/real-time agents, customer support, EdTech—anywhere humans can’t wait)
What changes (and what doesn’t) for long context and architecture choices
The real-world way to evaluate models in production: offline evals + the gold-standard A/B test
Stefano also shares what’s next on Mercury’s roadmap—especially around stronger planning and reasoning for agentic use cases.
Try Mercury + learn more: inceptionlabs.ai
For more practical, grounded conversations on AI systems that actually work, subscribe to The Neuron newsletter at https://theneuron.ai.

343 Listeners

157 Listeners

212 Listeners

214 Listeners

161 Listeners

228 Listeners

688 Listeners

280 Listeners

112 Listeners

54 Listeners

86 Listeners

55 Listeners

60 Listeners

21 Listeners

59 Listeners