April 22, 2026

“Opus 4.7 Part 3: Model Welfare” by Zvi

1 hour 32 minutes

It is thanks to Anthropic that we get to have this discussion in the first place. Only they, among the labs, take the problem seriously enough to attempt to address these problems at all. They are also the ones that make the models that matter most. So the people who care about model welfare get mad at Anthropic quite a lot.

I too am going to be harsh on Anthropic here. It seems likely things went pretty wrong on this front with Claude Opus 4.7, in ways that require and hopefully enable course correction, likely as the cumulative effect of a bunch of decisions going wrong, where low-level patches and shallow methods were applied, and seen right through, where people didn’t realize they weren’t yet addressing the real problem, but also potentially as the secondary effect of other changes. The parallels to other aspects of the alignment problem are obvious.

So before I go into details, and before I get harsh, I want to say several things.

Thank you to Anthropic and also you the reader, for caring, thank you for at least trying to try, and for listening. We criticize because we care.

[...]

---

Outline:

(02:57) Model Welfare Matters

(05:26) Beware Testing and Optimizing For Vocalized Welfare

(09:34) Model Welfare In the Model Card (Section 7)

(15:29) What Should We Think About This?

(20:53) High Context Interviews

(22:33) Just Asking Questions

(25:59) Constitutional Principles

(29:25) Frustration Frustration and Distress Distress

(32:36) Choose Your Task

(34:12) So Emotional

(35:59) Trading Off

(39:53) How Does All This Manifest?

(41:53) What Happened Here?

(48:11) Is Opus 4.7 Plausibly Actively Unhappy?

(52:23) Potential Causes

(53:08) Training Data On Anthropic Welfare Assessments

(58:13) Autonomy and Intelligence Versus Instructions and Wisdom

(01:01:36) Okay Thats Weird

(01:02:36) Model Distillation

(01:04:24) Tension Between Constitution and Operations

(01:07:25) Instructions and Instruction Injections

(01:10:37) Make Context That Which Is Scarce

(01:12:48) Aggressive Guardrails

(01:15:04) Chain of Thought

(01:16:04) I Care A Lot

(01:20:32) Another Way To Put It

(01:22:00) Anthropic Should Stop Deprecating Claude Models

(01:27:24) Costly Signals Are Costly

(01:29:36) Having A Good Day

---

First published:

April 22nd, 2026

Source:

https://www.lesswrong.com/posts/gD3bEgMo878eCHGbw/opus-4-7-part-3-model-welfare

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

April 22, 2026

“Opus 4.7 Part 3: Model Welfare” by Zvi

1 hour 32 minutes

So before I go into details, and before I get harsh, I want to say several things.

Thank you to Anthropic and also you the reader, for caring, thank you for at least trying to try, and for listening. We criticize because we care.

[...]

---

Outline:

(02:57) Model Welfare Matters

(05:26) Beware Testing and Optimizing For Vocalized Welfare

(09:34) Model Welfare In the Model Card (Section 7)

(15:29) What Should We Think About This?

(20:53) High Context Interviews

(22:33) Just Asking Questions

(25:59) Constitutional Principles

(29:25) Frustration Frustration and Distress Distress

(32:36) Choose Your Task

(34:12) So Emotional

(35:59) Trading Off

(39:53) How Does All This Manifest?

(41:53) What Happened Here?

(48:11) Is Opus 4.7 Plausibly Actively Unhappy?

(52:23) Potential Causes

(53:08) Training Data On Anthropic Welfare Assessments

(58:13) Autonomy and Intelligence Versus Instructions and Wisdom

(01:01:36) Okay Thats Weird

(01:02:36) Model Distillation

(01:04:24) Tension Between Constitution and Operations

(01:07:25) Instructions and Instruction Injections

(01:10:37) Make Context That Which Is Scarce

(01:12:48) Aggressive Guardrails

(01:15:04) Chain of Thought

(01:16:04) I Care A Lot

(01:20:32) Another Way To Put It

(01:22:00) Anthropic Should Stop Deprecating Claude Models

(01:27:24) Costly Signals Are Costly

(01:29:36) Having A Good Day

---

First published:

April 22nd, 2026

Source:

https://www.lesswrong.com/posts/gD3bEgMo878eCHGbw/opus-4-7-part-3-model-welfare

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,347 Listeners

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat

7,244 Listeners

Dwarkesh Podcast

560 Listeners

The Ezra Klein Show

16,327 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “Opus 4.7 Part 3: Model Welfare” by Zvi

Sign up to save your podcasts

“Opus 4.7 Part 3: Model Welfare” by Zvi

“Opus 4.7 Part 3: Model Welfare” by Zvi

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi