Latent Space: The AI Engineer Podcast

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI


Listen Later

Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall

Timestamps

00:00 Intro – Diplomacy, Cicero & World Championship
02:00 Reverse Centaur: How AI Improved Noam’s Human Play
05:00 Turing Test Failures in Chat: Hallucinations & Steerability
07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm
11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe)
14:00 The Deep Research Existence Proof for Unverifiable Domains
17:30 Harnesses, Tool Use, and Fragility in AI Agents
21:00 The Case Against Over-Reliance on Scaffolds and Routers
24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability
28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough
34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments
38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews
41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis
44:30 Implicit World Models and Theory of Mind Through Scaling
48:00 Why Self-Play Breaks Down Beyond Go and Chess
54:00 Designing Better Benchmarks for Fuzzy Tasks
57:30 The Real Limits of Test-Time Compute: Cost vs. Time
1:00:30 Data Efficiency Gaps Between Humans and LLMs
1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining
1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego
1:10:00 Closing Thoughts – Five-Year View and Open Research Directions

Chapters
  • 00:00:00 Intro & Guest Welcome
  • 00:00:33 Diplomacy AI & Cicero Insights
  • 00:03:49 AI Safety, Language Models, and Steerability
  • 00:05:23 O Series Models: Progress and Benchmarks
  • 00:08:53 Reasoning Paradigm: Thinking Fast and Slow in AI
  • 00:14:02 Design Questions: Harnesses, Tools, and Test Time Compute
  • 00:20:32 Reinforcement Fine-tuning & Model Specialization
  • 00:21:52 The Rise of Reasoning Models at OpenAI
  • 00:29:33 Data Efficiency in Machine Learning
  • 00:33:21 Coding & AI: Codex, Workflows, and Developer Experience
  • 00:41:38 Multi-Agent AI: Collaboration, Competition, and Civilization
  • 00:45:14 Poker, Diplomacy & Exploitative vs. Optimal AI Strategy
  • 00:52:11 World Models, Multi-Agent Learning, and Self-Play
  • 00:58:50 Generative Media: Image & Video Models
  • 01:00:44 Robotics: Humanoids, Iteration Speed, and Embodiment
  • 01:04:25 Rapid Fire: Research Practices, Benchmarks, and AI Progress
  • 01:14:19 Games, Imperfect Information, and AI Research Directions
...more
View all episodesView all episodes
Download on the App Store

Latent Space: The AI Engineer PodcastBy swyx + Alessio

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

66 ratings


More shows like Latent Space: The AI Engineer Podcast

View all
a16z Podcast by Andreessen Horowitz

a16z Podcast

1,032 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

441 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

322 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

198 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

87 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

389 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

121 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

201 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

462 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

461 Listeners

AI + a16z by a16z

AI + a16z

29 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

22 Listeners

Training Data by Sequoia Capital

Training Data

43 Listeners