
Sign up to save your podcasts
Or


Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall
Timestamps
00:00 Intro – Diplomacy, Cicero & World Championship
02:00 Reverse Centaur: How AI Improved Noam’s Human Play
05:00 Turing Test Failures in Chat: Hallucinations & Steerability
07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm
11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe)
14:00 The Deep Research Existence Proof for Unverifiable Domains
17:30 Harnesses, Tool Use, and Fragility in AI Agents
21:00 The Case Against Over-Reliance on Scaffolds and Routers
24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability
28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough
34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments
38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews
41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis
44:30 Implicit World Models and Theory of Mind Through Scaling
48:00 Why Self-Play Breaks Down Beyond Go and Chess
54:00 Designing Better Benchmarks for Fuzzy Tasks
57:30 The Real Limits of Test-Time Compute: Cost vs. Time
1:00:30 Data Efficiency Gaps Between Humans and LLMs
1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining
1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego
1:10:00 Closing Thoughts – Five-Year View and Open Research Directions
By swyx + Alessio4.7
8686 ratings
Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall
Timestamps
00:00 Intro – Diplomacy, Cicero & World Championship
02:00 Reverse Centaur: How AI Improved Noam’s Human Play
05:00 Turing Test Failures in Chat: Hallucinations & Steerability
07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm
11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe)
14:00 The Deep Research Existence Proof for Unverifiable Domains
17:30 Harnesses, Tool Use, and Fragility in AI Agents
21:00 The Case Against Over-Reliance on Scaffolds and Routers
24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability
28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough
34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments
38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews
41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis
44:30 Implicit World Models and Theory of Mind Through Scaling
48:00 Why Self-Play Breaks Down Beyond Go and Chess
54:00 Designing Better Benchmarks for Fuzzy Tasks
57:30 The Real Limits of Test-Time Compute: Cost vs. Time
1:00:30 Data Efficiency Gaps Between Humans and LLMs
1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining
1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego
1:10:00 Closing Thoughts – Five-Year View and Open Research Directions

536 Listeners

291 Listeners

1,095 Listeners

303 Listeners

340 Listeners

237 Listeners

214 Listeners

197 Listeners

505 Listeners

131 Listeners

209 Listeners

591 Listeners

521 Listeners

22 Listeners

40 Listeners