
Sign up to save your podcasts
Or


Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.
Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.
Claude's aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.
I think this is maybe the first chance to really measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.
On X [...]
---
First published:
Source:
Linkpost URL:
https://www.anthropic.com/research/end-subset-conversations
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongCiting model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.
Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.
Claude's aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.
I think this is maybe the first chance to really measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.
On X [...]
---
First published:
Source:
Linkpost URL:
https://www.anthropic.com/research/end-subset-conversations
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,392 Listeners

2,423 Listeners

8,623 Listeners

4,151 Listeners

92 Listeners

1,585 Listeners

9,830 Listeners

89 Listeners

488 Listeners

5,469 Listeners

16,035 Listeners

536 Listeners

133 Listeners

96 Listeners

502 Listeners