LessWrong posts by zvi

“GPT-5.4 Is A Substantial Upgrade” by Zvi


Listen Later

Benchmarks have never been less useful for telling us which models are best.

They are good for giving a general sense of the landscape. They definitely paint a picture. But if you’re comparing top models, like GPT-5.4 against Opus 4.6 against Gemini 3.1 Pro, you have to use the models, talk to the models, get reports from those who have and form a gestalt. The reports will contract each other and you have to work through that. There's no other way.

Thus, I try to gather and sort a reasonably comprehensive set of reactions, so you can browse the sections that make you most curious.

The gestalt is that GPT-5.4 is a very good model, sir. It's a substantial upgrade from GPT-5.2, and also from 5.3-Codex, and it puts OpenAI back in the game, whereas I felt like Opus 4.6 dominated OpenAI's previous offerings for all but narrow uses.

Each lab's models vary and things change over time, but they tend to have consistent strengths, weaknesses and personalities. From what I’ve seen this is very much an OpenAI model. It's highly capable, and it is especially seen as a big improvement by the whisperers and [...]

---

Outline:

(01:42) The Big Take

(04:24) The Official Pitch

(08:43) Other Peoples Benchmarks

(12:44) The System Card

(17:22) Preparedness Framework

(19:15) Fun Experiments

(19:41) Early Poll Results

(21:48) Positive Reactions

(34:47) Vibe Coders Only

(37:26) Fill Out Your Roster

(37:51) Intent Wins

(40:41) Personality Clash

(45:44) Model Relations Department

(49:27) Stylistic Differences

(50:02) Some Will Always Be Unimpressed

(53:57) The Lighter Side

---

First published:

March 11th, 2026

Source:

https://www.lesswrong.com/posts/sKCYLEN5EYLuokDft/gpt-5-4-is-a-substantial-upgrade

---

Narrated by TYPE III AUDIO.

---

Images from the article:

GPT" at 9.4%, "Claude -> Claude" at 37.9%, "GPT -> GPT" at 13.6%, and "Other / See Results" at 39.1%. The poll has 683 votes with 1 day left." style="max-width: 100%;" /> GPT" at 6%, "Claude -> Claude" at 38.2%, "GPT -> GPT" at 15.3%, and "Other / See Results" at 40.6%. The poll received 419 votes and shows final results." style="max-width: 100%;" />

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong posts by zviBy zvi

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like LessWrong posts by zvi

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,276 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,448 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,106 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

108 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

289 Listeners

Politix by Politix

Politix

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

563 Listeners

Hard Fork by The New York Times

Hard Fork

5,549 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

138 Listeners

LessWrong (Curated & Popular) by LessWrong

LessWrong (Curated & Popular)

12 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

146 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

149 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

461 Listeners

LessWrong (30+ Karma) by LessWrong

LessWrong (30+ Karma)

0 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

141 Listeners