
Sign up to save your podcasts
Or


This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents.
Visit https://agntcy.org/ and add your support. Most large language models today generate text one token at a time. That design choice creates a hard limit on speed, cost, and scalability. In this episode of Eye on AI, Stefano Ermon breaks down diffusion language models and why a parallel, inference-first approach could define the next generation of LLMs. We explore how diffusion models differ from autoregressive systems, why inference efficiency matters more than training scale, and what this shift means for real-time AI applications like code generation, agents, and voice systems. This conversation goes deep into AI architecture, model controllability, latency, cost trade-offs, and the future of generative intelligence as AI moves from demos to production-scale systems. Stay Updated: Craig Smith on X: https://x.com/craigssEye on A.I. on X: https://x.com/EyeOn_AI (00:00) Autoregressive vs Diffusion LLMs (02:12) Why Build Diffusion LLMs (05:51) Context Window Limits (08:39) How Diffusion Works (11:58) Global vs Token Prediction (17:19) Model Control and Safety (19:48) Training and RLHF (22:35) Evaluating Diffusion Models (24:18) Diffusion LLM Competition (30:09) Why Start With Code (32:04) Enterprise Fine-Tuning (33:16) Speed vs Accuracy Tradeoffs (35:34) Diffusion vs Autoregressive Future (38:18) Coding Workflows in Practice (43:07) Voice and Real-Time Agents (44:59) Reasoning Diffusion Models (46:39) Multimodal AI Direction (50:10) Handling Hallucinations
By Craig S. Smith4.7
5555 ratings
This episode is sponsored by AGNTCY. Unlock agents at scale with an open Internet of Agents.
Visit https://agntcy.org/ and add your support. Most large language models today generate text one token at a time. That design choice creates a hard limit on speed, cost, and scalability. In this episode of Eye on AI, Stefano Ermon breaks down diffusion language models and why a parallel, inference-first approach could define the next generation of LLMs. We explore how diffusion models differ from autoregressive systems, why inference efficiency matters more than training scale, and what this shift means for real-time AI applications like code generation, agents, and voice systems. This conversation goes deep into AI architecture, model controllability, latency, cost trade-offs, and the future of generative intelligence as AI moves from demos to production-scale systems. Stay Updated: Craig Smith on X: https://x.com/craigssEye on A.I. on X: https://x.com/EyeOn_AI (00:00) Autoregressive vs Diffusion LLMs (02:12) Why Build Diffusion LLMs (05:51) Context Window Limits (08:39) How Diffusion Works (11:58) Global vs Token Prediction (17:19) Model Control and Safety (19:48) Training and RLHF (22:35) Evaluating Diffusion Models (24:18) Diffusion LLM Competition (30:09) Why Start With Code (32:04) Enterprise Fine-Tuning (33:16) Speed vs Accuracy Tradeoffs (35:34) Diffusion vs Autoregressive Future (38:18) Coding Workflows in Practice (43:07) Voice and Real-Time Agents (44:59) Reasoning Diffusion Models (46:39) Multimodal AI Direction (50:10) Handling Hallucinations

478 Listeners

172 Listeners

345 Listeners

152 Listeners

200 Listeners

97 Listeners

137 Listeners

93 Listeners

150 Listeners

227 Listeners

631 Listeners

273 Listeners

27 Listeners

34 Listeners

41 Listeners