LessWrong posts by zvi

“GPT-5.4 Is A Substantial Upgrade” by Zvi


Listen Later

Benchmarks have never been less useful for telling us which models are best.

They are good for giving a general sense of the landscape. They definitely paint a picture. But if you’re comparing top models, like GPT-5.4 against Opus 4.6 against Gemini 3.1 Pro, you have to use the models, talk to the models, get reports from those who have and form a gestalt. The reports will contract each other and you have to work through that. There's no other way.

Thus, I try to gather and sort a reasonably comprehensive set of reactions, so you can browse the sections that make you most curious.

The gestalt is that GPT-5.4 is a very good model, sir. It's a substantial upgrade from GPT-5.2, and also from 5.3-Codex, and it puts OpenAI back in the game, whereas I felt like Opus 4.6 dominated OpenAI's previous offerings for all but narrow uses.

Each lab's models vary and things change over time, but they tend to have consistent strengths, weaknesses and personalities. From what I’ve seen this is very much an OpenAI model. It's highly capable, and it is especially seen as a big improvement by the whisperers and [...]

---

Outline:

(01:42) The Big Take

(04:24) The Official Pitch

(08:43) Other Peoples Benchmarks

(12:44) The System Card

(17:22) Preparedness Framework

(19:15) Fun Experiments

(19:41) Early Poll Results

(21:48) Positive Reactions

(34:47) Vibe Coders Only

(37:26) Fill Out Your Roster

(37:51) Intent Wins

(40:41) Personality Clash

(45:44) Model Relations Department

(49:27) Stylistic Differences

(50:02) Some Will Always Be Unimpressed

(53:57) The Lighter Side

---

First published:

March 11th, 2026

Source:

https://www.lesswrong.com/posts/sKCYLEN5EYLuokDft/gpt-5-4-is-a-substantial-upgrade

---

Narrated by TYPE III AUDIO.

---

Images from the article:

GPT" at 9.4%, "Claude -> Claude" at 37.9%, "GPT -> GPT" at 13.6%, and "Other / See Results" at 39.1%. The poll has 683 votes with 1 day left." style="max-width: 100%;" /> GPT" at 6%, "Claude -> Claude" at 38.2%, "GPT -> GPT" at 15.3%, and "Other / See Results" at 40.6%. The poll received 419 votes and shows final results." style="max-width: 100%;" />

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong posts by zviBy zvi

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like LessWrong posts by zvi

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,380 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,461 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

109 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

291 Listeners

Politix by Politix

Politix

90 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

Hard Fork by The New York Times

Hard Fork

5,576 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

137 Listeners

LessWrong (Curated & Popular) by LessWrong

LessWrong (Curated & Popular)

13 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

475 Listeners

LessWrong (30+ Karma) by LessWrong

LessWrong (30+ Karma)

0 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

143 Listeners