Dwarkesh Podcast

Eric Jang – Building AlphaGo from scratch


Listen Later

Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.

Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.

Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.

Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.

Watch on YouTube. Read the transcript.

And check out the flashcards I wrote to retain the insights.

Sponsors

* Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh

* Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh

Timestamps

(00:00:00) – Basics of Go

(00:08:17) – Monte Carlo Tree Search

(00:32:04) – What the neural network does

(01:00:33) – Self-play

(01:25:38) – Alternative RL approaches

(01:45:47) – Why doesn't MCTS work for LLMs

(02:01:09) – Off-policy training

(02:12:02) – RL is even more information inefficient than you thought

(02:22:16) – Automated AI researchers



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Dwarkesh PodcastBy Dwarkesh Patel

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

475 ratings


More shows like Dwarkesh Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

536 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,461 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

233 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,254 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

602 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

475 Listeners

AI + a16z by a16z

AI + a16z

34 Listeners

TBPN by John Coogan & Jordi Hays

TBPN

140 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

42 Listeners