
Sign up to save your podcasts
Or


Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs.
Stefano's favorite book: If on a Winter's Night a Traveler (Author: Italo Calvino)
(00:01) Introduction
(00:38) What are autoregressive LLMs and how do they work
(02:28) How diffusion LLMs rethink generation
(04:02) The ceiling of autoregressive LLMs: cost, latency, reliability
(06:19) Why diffusion LLMs are commercially viable now
(09:12) Parallel refinement: how diffusion models generate text
(12:05) Understanding diffusion steps and efficiency
(13:49) Hardest engineering challenges at Inception
(15:23) From research to production: the power of data
(16:24) Where diffusion LLMs still lag behind
(18:18) Evaluations and benchmarks for diffusion LLMs
(20:20) Developer experience and OpenAI-compatible API
(21:47) Economics and GPU efficiency
(23:38) Hardware and runtime stack
(24:58) Competition and the evolving diffusion LLM landscape
(27:01) Where diffusion will win first — coding and agentic systems
(30:13) How diffusion changes infra, serving, and hardware design
(33:04) What’s next at Inception: reasoning and multimodality
(35:20) Rapid Fire Round
--------
Where to find Stefano Ermon:
LinkedIn: https://www.linkedin.com/in/ermon/
--------
Where to find Prateek Joshi:
Research column: https://www.infrastartups.com
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite
X: https://x.com/prateekvjoshi
By Prateek Joshi4.9
88 ratings
Stefano Ermon is the cofounder of Inception Labs and an associate professor at Stanford. Inception is developing a new type of AI models called Diffusion LLMs.
Stefano's favorite book: If on a Winter's Night a Traveler (Author: Italo Calvino)
(00:01) Introduction
(00:38) What are autoregressive LLMs and how do they work
(02:28) How diffusion LLMs rethink generation
(04:02) The ceiling of autoregressive LLMs: cost, latency, reliability
(06:19) Why diffusion LLMs are commercially viable now
(09:12) Parallel refinement: how diffusion models generate text
(12:05) Understanding diffusion steps and efficiency
(13:49) Hardest engineering challenges at Inception
(15:23) From research to production: the power of data
(16:24) Where diffusion LLMs still lag behind
(18:18) Evaluations and benchmarks for diffusion LLMs
(20:20) Developer experience and OpenAI-compatible API
(21:47) Economics and GPU efficiency
(23:38) Hardware and runtime stack
(24:58) Competition and the evolving diffusion LLM landscape
(27:01) Where diffusion will win first — coding and agentic systems
(30:13) How diffusion changes infra, serving, and hardware design
(33:04) What’s next at Inception: reasoning and multimodality
(35:20) Rapid Fire Round
--------
Where to find Stefano Ermon:
LinkedIn: https://www.linkedin.com/in/ermon/
--------
Where to find Prateek Joshi:
Research column: https://www.infrastartups.com
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite
X: https://x.com/prateekvjoshi

8,402 Listeners

537 Listeners

1,085 Listeners

1,219 Listeners

302 Listeners

112,502 Listeners

226 Listeners

210 Listeners

9,911 Listeners

505 Listeners

5,526 Listeners

15,973 Listeners

135 Listeners

607 Listeners

39 Listeners