Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served


Listen Later

Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served.

It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.

It’s a bit technical, but I encourage you to hang in there – it’s really worth it.

There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him.

Recommend watching this one on YouTube so you can see the chalkboard.

Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.

Download markdown of transcript here to chat with an LLM.

Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too!

Sponsors

* Jane Street needs constant access to incredibly low-latency compute. I recently asked one of their engineers, Clark, to talk me through how they meet these demands. Our conversation—which touched on everything from FPGAs to liquid cooling—was extremely helpful as I prepped to interview Reiner. You can watch the full discussion and explore Jane Street’s open roles at janestreet.com/dwarkesh

* Google’s Gemma 4 is the first open model that’s let me shut off the internet and create a fully disconnected “focus machine”. This is because Gemma is small enough to run on my laptop, but powerful enough to actually be useful. So, to prep for this interview, I downloaded Reiner’s scaling book, disconnected from wifi, and used Gemma to help me break down the material. Check it out at goo.gle/Gemma4

* Cursor helped me turn some notes I took on how gradients flow during large-scale pretraining into a great animation. At first, I wasn’t sure the best way to visualize the concept, but Cursor’s Composer 2 Fast model let me iterate on different ideas almost instantaneously. You can check out the animation in my recent blog post. And if you have something to visualize yourself, go to cursor.com/dwarkesh

Timestamps

(00:00:00) – How batch size affects token cost and speed

(00:32:09) – How MoE models are laid out across GPU racks

(00:47:12) – How pipeline parallelism spreads model layers across racks

(01:03:37) – Why Ilya said, “As we now know, pipelining is not wise.”

(01:18:59) – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal

(01:33:02) – Deducing long context memory costs from API pricing

(02:04:02) – Convergent evolution between neural nets and cryptography



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Dwarkesh PodcastBy Dwarkesh Patel

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

475 ratings


More shows like Dwarkesh Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

536 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,461 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

233 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,254 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

602 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

475 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners

TBPN by John Coogan & Jordi Hays

TBPN

140 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

42 Listeners