LessWrong posts by zvi

ChatGPT 5.1 Codex Max


Listen Later

OpenAI has given us GPT-5.1-Codex-Max, their best coding model for OpenAI Codex.

They claim it is faster, more capable and token-efficient and has better persistence on long tasks.

It scores 77.9% on SWE-bench-verified, 79.9% on SWE-Lancer-IC SWE and 58.1% on Terminal-Bench 2.0, all substantial gains over GPT-5.1-Codex.

It's triggering OpenAI to prepare for being high level in cybersecurity threats.

There's a 27 page system card. One could call this the secret ‘real’ GPT-5.1 that matters.

They even finally trained it to use Windows, somehow this is a new idea.

My goal is for my review of Opus 4.5 to start on Friday, as it takes a few days to sort through new releases. This post was written before Anthropic revealed Opus 4.5, and we don’t yet know how big an upgrade Opus 4.5 will prove to be. As always, try all your various options and choose what is best for you.

The Famous METR Graph

GPT-5.1-Codex-Max is a new high on the METR graph. METR's thread is here.

Prinz: METR (50% accuracy):

GPT-5.1-Codex-Max = 2 hours, 42 minutes

This is 25 minutes longer than GPT-5.

Samuel Albanie [...]

---

Outline:

(01:18) The Famous METR Graph

(02:46) The System Card

(03:43) Basic Disallowed Content

(04:17) Sandbox

(05:34) Mitigations For Harmful Tasks and Prompt Injections

(06:13) Preparedness Framework

(06:35) Biological and Chemical

(07:50) Cybersecurity

(11:58) AI Self-Improvement

(14:27) Reactions

---

First published:

November 25th, 2025

Source:

https://www.lesswrong.com/posts/YMFYQpsY2MGbXKPtS/chatgpt-5-1-codex-max

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong posts by zviBy zvi

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like LessWrong posts by zvi

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,330 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,453 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,089 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

108 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

289 Listeners

Politix by Politix

Politix

94 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

511 Listeners

Hard Fork by The New York Times

Hard Fork

5,512 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

138 Listeners

LessWrong (Curated & Popular) by LessWrong

LessWrong (Curated & Popular)

13 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

152 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

467 Listeners

LessWrong (30+ Karma) by LessWrong

LessWrong (30+ Karma)

0 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

134 Listeners