Latent Space: The AI Engineer Podcast

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI


Listen Later

From the frontlines of OpenAI's Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don't just autocomplete, they architect, refactor, and ship entire features while you sleep. We caught up with them at AI Engineer Conference right after the launch of Codex Max, OpenAI's newest long-running coding agent designed to work for 24+ hours straight, manage its own context, and spawn sub-agents to parallelize work across your entire codebase.

We sat down with Bryan and Bill to dig into what it actually takes to train a model that developers trust—why personality, communication, and planning matter as much as raw capability, how Codex is trained with strong opinions about tools (it loves rg over grep, seriously), why the abstraction layer is moving from models to full-stack agents you can plug into VS Code or Zed, how OpenAI partners co-develop tool integrations and discover unexpected model habits (like renaming tools to match Codex's internal training), the rise of applied evals that measure real-world impact instead of academic benchmarks, why multi-turn evals are the next frontier (and Bryan's "job interview eval" idea), how coding agents are breaking out of code into personal automation, terminal workflows, and computer use, and their 2026 vision: coding agents trusted enough to handle the hardest refactors at any company, not just top-tier firms, and general enough to build integrations, organize your desktop, and unlock capabilities you'd never get access to otherwise.

We discuss:

  • What Codex Max is: a long-running coding agent that can work 24+ hours, manage its own context window, and spawn sub-agents for parallel work

  • Why the name "Max": maximalist, maximization, speed and endurance—it's simply better and faster for the same problems

  • Training for personality: communication, planning, context gathering, and checking your work as behavioral characteristics, not just capabilities

  • How Codex develops habits like preferring rg over grep, and why renaming tools to match its training (e.g., terminal-style naming) dramatically improves tool-call performance

  • The split between Codex (opinionated, agent-focused, optimized for the Codex harness) and GPT-5 (general, more durable across different tools and modalities)

  • Why the abstraction layer is moving up: from prompting models to plugging in full agents (Codex, GitHub Copilot, Zed) that package the entire stack

  • The rise of sub-agents and agents-using-agents: Codex Max spawning its own instances, handing off context, and parallelizing work across a codebase

  • How OpenAI works with coding partners on the bleeding edge to co-develop tool integrations and discover what the model is actually good at

  • The shift to applied evals: capturing real-world use cases instead of academic benchmarks, and why ~50% of OpenAI employees now use Codex daily

  • Why multi-turn evals are the next frontier: LM-as-a-judge for entire trajectories, Bryan's "job interview eval" concept, and the need for a batch multi-turn eval API

  • How coding agents are breaking out of code: personal automation, organizing desktops, terminal workflows, and "Devin for non-coding" use cases

  • Why Slack is the ultimate UI for work, and how coding agents can become your personal automation layer for email, files, and everything in between

  • The 2026 vision: more computer use, more trust, and coding agents capable enough that any company can access top-tier developer capabilities, not just elite firms

Bryan & Bill (OpenAI Codex Team)

  • http://x.com/bfioca

  • https://x.com/realchillben

  • OpenAI Codex: https://openai.com/index/openai-codex/

Where to find Latent Space

  • X: https://x.com/latentspacepod

  • Substack: https://www.latent.space/

Chapters
  • 00:00:00 Introduction: Latent Space Listeners at AI Engineer Code
  • 00:01:27 Codex Max Launch: Training for Long-Running Coding Agents
  • 00:03:01 Model Personality and Trust: Communication, Planning, and Self-Checking
  • 00:05:20 Codex vs GPT-5: Opinionated Agents vs General Models
  • 00:07:47 Tool Use and Model Habits: The Ripgrep Discovery
  • 00:09:16 Personality Design: Verbosity vs Efficiency in Coding Agents
  • 00:11:56 The Agent Abstraction Layer: Building on Top of Codex
  • 00:14:08 Sub-Agents and Multi-Agent Patterns: The Future of Composition
  • 00:16:11 Trust and Adoption: OpenAI Developers Using Codex Daily
  • 00:17:21 Applied Evals: Real-World Testing vs Academic Benchmarks
  • 00:19:15 Multi-Turn Evals and the Job Interview Pattern
  • 00:21:35 Feature Request: Batch Multi-Turn Eval API
  • 00:22:28 Beyond Code: Personal Automation and Computer Use
  • 00:24:51 Vision-Native Agents and the UI Integration Challenge
  • 00:25:02 2026 Predictions: Trust, Computer Use, and Democratized Excellence

...more
View all episodesView all episodes
Download on the App Store

Latent Space: The AI Engineer PodcastBy swyx + Alessio

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

86 ratings


More shows like Latent Space: The AI Engineer Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

533 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

290 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,092 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

332 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

228 Listeners

Practical AI by Practical AI LLC

Practical AI

205 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

205 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

515 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

622 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

471 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

23 Listeners

Training Data by Sequoia Capital

Training Data

39 Listeners