October 09, 2025

Diffusion LLMs - The Fastest LLMs Ever Built | Stefano Ermon, cofounder of Inception Labs

39 minutes

Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs.

Stefano's favorite book: If on a Winter's Night a Traveler (Author: Italo Calvino)

(00:01) Introduction
(00:38) What are autoregressive LLMs and how do they work
(02:28) How diffusion LLMs rethink generation
(04:02) The ceiling of autoregressive LLMs: cost, latency, reliability
(06:19) Why diffusion LLMs are commercially viable now
(09:12) Parallel refinement: how diffusion models generate text
(12:05) Understanding diffusion steps and efficiency
(13:49) Hardest engineering challenges at Inception
(15:23) From research to production: the power of data
(16:24) Where diffusion LLMs still lag behind
(18:18) Evaluations and benchmarks for diffusion LLMs
(20:20) Developer experience and OpenAI-compatible API
(21:47) Economics and GPU efficiency
(23:38) Hardware and runtime stack
(24:58) Competition and the evolving diffusion LLM landscape
(27:01) Where diffusion will win first — coding and agentic systems
(30:13) How diffusion changes infra, serving, and hardware design
(33:04) What’s next at Inception: reasoning and multimodality
(35:20) Rapid Fire Round

--------
Where to find Stefano Ermon:

LinkedIn: https://www.linkedin.com/in/ermon/

--------
Where to find Prateek Joshi:

Research column: https://www.infrastartups.com
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite
X: https://x.com/prateekvjoshi

...more