
Sign up to save your podcasts
Or


Diffusion models changed how we generate images and video—now they’re coming for text.
In this episode, we sit down with Stefano Ermon, Stanford computer science professor and founder of Inception Labs, to unpack how diffusion works for language, why it can generate in parallel (instead of token-by-token), and what that means for latency, cost, and real-time AI products.
We talk through:
The simplest mental model for diffusion: generate a full draft, then refine it by “fixing mistakes”
Why today’s autoregressive LLM inference is often memory-bound—and why diffusion can shift it toward a more GPU-friendly compute profile
Where Mercury wins today (IDEs, voice/real-time agents, customer support, EdTech—anywhere humans can’t wait)
What changes (and what doesn’t) for long context and architecture choices
The real-world way to evaluate models in production: offline evals + the gold-standard A/B test
Stefano also shares what’s next on Mercury’s roadmap—especially around stronger planning and reasoning for agentic use cases.
Try Mercury + learn more: inceptionlabs.ai
For more practical, grounded conversations on AI systems that actually work, subscribe to The Neuron newsletter at https://theneuron.ai.
By The Neuron4.8
6363 ratings
Diffusion models changed how we generate images and video—now they’re coming for text.
In this episode, we sit down with Stefano Ermon, Stanford computer science professor and founder of Inception Labs, to unpack how diffusion works for language, why it can generate in parallel (instead of token-by-token), and what that means for latency, cost, and real-time AI products.
We talk through:
The simplest mental model for diffusion: generate a full draft, then refine it by “fixing mistakes”
Why today’s autoregressive LLM inference is often memory-bound—and why diffusion can shift it toward a more GPU-friendly compute profile
Where Mercury wins today (IDEs, voice/real-time agents, customer support, EdTech—anywhere humans can’t wait)
What changes (and what doesn’t) for long context and architecture choices
The real-world way to evaluate models in production: offline evals + the gold-standard A/B test
Stefano also shares what’s next on Mercury’s roadmap—especially around stronger planning and reasoning for agentic use cases.
Try Mercury + learn more: inceptionlabs.ai
For more practical, grounded conversations on AI systems that actually work, subscribe to The Neuron newsletter at https://theneuron.ai.

348 Listeners

160 Listeners

216 Listeners

207 Listeners

162 Listeners

228 Listeners

668 Listeners

280 Listeners

108 Listeners

58 Listeners

88 Listeners

56 Listeners

61 Listeners

22 Listeners

59 Listeners