August 16, 2025

[Linkpost] “Anthropic Lets Claude Opus 4 & 4.1 End Conversations” by Stephen Martin

5 minutes

This is a link post.

Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.

Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.

Claude's aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.

I think this is maybe the first chance to really measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.

On X [...]

---

First published:

August 16th, 2025

Source:

https://www.lesswrong.com/posts/HGyKm2be6u3EeYv9G/anthropic-lets-claude-opus-4-and-4-1-end-conversations

Linkpost URL:
https://www.anthropic.com/research/end-subset-conversations

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.