The Information Bottleneck

The Principles of Diffusion Models - with Jesse Lai (Sony AI)


Listen Later

We host Chieh-Hsin (Jesse) Lai, Staff Research Scientist at Sony AI and visiting professor at National Yang Ming Chiao Tung University, Taiwan, for a conversation about diffusion models, the technology behind tools like Stable Diffusion, and most of the AI image and video generators you've seen in the last few years. Jesse recently co-authored The Principles of Diffusion Models with Stefano Ermon, and the book is quickly becoming a go-to reference in the field.

We start with what a generative model actually is, and what it means to "generate" an image or a sound. Jesse explains the core idea behind diffusion in plain terms. You start with pure noise, and a neural network gradually cleans it up, step by step, until a realistic image emerges.

From there, we talk about why diffusion has come to dominate so much of generative AI. Because the model builds an image gradually, you can guide it along the way, nudging the output toward what you actually want, refining details, or combining it with other controls. We also discuss the common critique that diffusion is slow and how the field has largely addressed it through new techniques.

We zoom out to the bigger picture, too. Jesse shares his view on world models and whether diffusion is the right foundation for them. We talk about what makes a generative model genuinely good versus just good at gaming benchmarks, and why evaluating creativity and realism is so much harder than scoring a multiple-choice test.

Timeline

00:12 — Intro and welcoming Jesse

00:47 — Why Jesse wrote the book, and who it's for

03:29 — The three families of diffusion models, and why they're really one idea

05:14 — What makes a good generative model

07:39 — How do you even measure if a generated image is good

08:59 — Why diffusion beats autoregressive models for images

10:33 — Is diffusion still slow? How fast generation got fast

11:12 — A simple intuition for what a "score" is

14:12 — How the different flavors of diffusion connect under the hood

14:42 — Diffusion for text and proteins

17:12 — Consistency models and the push for one-step generation

22:12 — Diffusion for world models: simulating reality in real time

26:12 — Do world models need to understand language

35:12 — Is diffusion the right tool, or just a convenient one

38:12 — What benchmarks actually tell us, and what they miss

46:12 — Closing thoughts and where to find the book

Music:

  • "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

...more
View all episodesView all episodes
Download on the App Store

The Information BottleneckBy Ravid Shwartz-Ziv & Allen Roush

  • 5
  • 5
  • 5
  • 5
  • 5

5

4 ratings


More shows like The Information Bottleneck

View all
The New Yorker Radio Hour by WNYC Studios and The New Yorker

The New Yorker Radio Hour

6,776 Listeners

Fareed Zakaria GPS by CNN Podcasts

Fareed Zakaria GPS

3,398 Listeners

Macro Voices by Hedge Fund Manager Erik Townsend

Macro Voices

3,073 Listeners

Odd Lots by Bloomberg

Odd Lots

1,978 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,096 Listeners

Practical AI by Practical AI LLC

Practical AI

213 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,226 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

198 Listeners

Last Week in AI by Skynet Today

Last Week in AI

318 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

97 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

561 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

507 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

595 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

145 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners