
Sign up to save your podcasts
Or
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall
Timestamps
00:00 Intro – Diplomacy, Cicero & World Championship
02:00 Reverse Centaur: How AI Improved Noam’s Human Play
05:00 Turing Test Failures in Chat: Hallucinations & Steerability
07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm
11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe)
14:00 The Deep Research Existence Proof for Unverifiable Domains
17:30 Harnesses, Tool Use, and Fragility in AI Agents
21:00 The Case Against Over-Reliance on Scaffolds and Routers
24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability
28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough
34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments
38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews
41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis
44:30 Implicit World Models and Theory of Mind Through Scaling
48:00 Why Self-Play Breaks Down Beyond Go and Chess
54:00 Designing Better Benchmarks for Fuzzy Tasks
57:30 The Real Limits of Test-Time Compute: Cost vs. Time
1:00:30 Data Efficiency Gaps Between Humans and LLMs
1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining
1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego
1:10:00 Closing Thoughts – Five-Year View and Open Research Directions
4.7
6666 ratings
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall
Timestamps
00:00 Intro – Diplomacy, Cicero & World Championship
02:00 Reverse Centaur: How AI Improved Noam’s Human Play
05:00 Turing Test Failures in Chat: Hallucinations & Steerability
07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm
11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe)
14:00 The Deep Research Existence Proof for Unverifiable Domains
17:30 Harnesses, Tool Use, and Fragility in AI Agents
21:00 The Case Against Over-Reliance on Scaffolds and Routers
24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability
28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough
34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments
38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews
41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis
44:30 Implicit World Models and Theory of Mind Through Scaling
48:00 Why Self-Play Breaks Down Beyond Go and Chess
54:00 Designing Better Benchmarks for Fuzzy Tasks
57:30 The Real Limits of Test-Time Compute: Cost vs. Time
1:00:30 Data Efficiency Gaps Between Humans and LLMs
1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining
1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego
1:10:00 Closing Thoughts – Five-Year View and Open Research Directions
1,032 Listeners
441 Listeners
298 Listeners
322 Listeners
192 Listeners
198 Listeners
87 Listeners
389 Listeners
121 Listeners
201 Listeners
462 Listeners
461 Listeners
29 Listeners
22 Listeners
43 Listeners