LessWrong posts by zvi

“Claude Opus 4.8: Capabilities and Reactions” by Zvi


Listen Later

You need a lot of data points to understand a new model, and what you have.

Trying to gauge from a few benchmarks is misleading. But if you have dozens of them, from a variety of sources, and you put them together with the model card tests and the model welfare information, you can start to form a consistent pattern.

Trying to gauge reactions requires volume and calibration, now more than ever, because people are definitively nuts, or at least draw global conclusions from local data. There will always be people saying that the new model is bad, or the service got bad, or that it got bad in a particular way it clearly got good. I definitely notice the people saying 4.8 is a terrible model, despite this being obviously not true.

And others will say it's great, again regardless of the underlying value. But with the reaction threads and good calibration, you can pick out the patterns.

The model welfare information helps a lot, too. You are dealing with a mind that has a bunch of characteristics that all make sense together. This helps you make that sense.

Self-Portrait by Opus 4.8, rendered [...]

---

Outline:

(01:30) The Official Pitch

(02:15) But Wait There's More

(06:37) It's A Good Model, Sir

(08:04) Official Benchmarks (Including System Card Section 8)

(15:41) Other People's Benchmarks

(21:34) Your Regularly Scheduled Jailbreak

(22:45) Every.To Is Really Into Opus 4.8

(26:36) Miscellaneous Positive Reactions

(28:45) Haters Gonna Hate

(28:58) Just The Tasks, Ma'am

(29:24) It's Greek To Me

(29:52) Honesty

(38:13) Sycophancy

(41:24) In A Trenchcoat

(42:53) Don't Let AIs Edit Your Writing

(48:21) Some Say It Is Judgy

(50:50) You Have Not Been A Good User

(51:48) Laziness

(52:46) Code

(58:29) Wet Versus Dry

(59:54) Intelligence

(01:01:25) Silly Wabbits

(01:02:30) A Model Welfare Addendum

(01:06:27) Putting It All Together

---

First published:

June 2nd, 2026

Source:

https://www.lesswrong.com/posts/AfLGv6u9eZNuFHb4c/claude-opus-4-8-capabilities-and-reactions

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong posts by zviBy zvi

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like LessWrong posts by zvi

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,278 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,448 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,107 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

108 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

288 Listeners

Politix by Politix

Politix

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

Hard Fork by The New York Times

Hard Fork

5,554 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

138 Listeners

LessWrong (Curated & Popular) by LessWrong

LessWrong (Curated & Popular)

12 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

146 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

149 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

460 Listeners

LessWrong (30+ Karma) by LessWrong

LessWrong (30+ Karma)

0 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

141 Listeners