
Sign up to save your podcasts
Or


Read the post here.
Note: This is also posted on Robert Long's Substack.
Intro
Last week, Anthropic announced that its newest language models, Claude Opus 4 and 4.1, can now shut down certain conversations with users. The announcement explains that Anthropic gave Claude this ability “as part of our exploratory work on potential AI welfare”.
This means that, for the first time, a major AI company has changed how it treats its AI systems out of concern for the well-being of the systems themselves, not just user safety. Whether or not you think Claude is or will be conscious—Anthropic themselves say that they are “deeply uncertain”—this decision is a notable moment in the history of human-AI interactions.
Some will see this as much ado about nothing. Others will see it as pernicious: hype, a distraction from more important issues, and an exacerbation of already-dangerous anthropomorphism. Others, a considerably smaller group, think that LLMs are obviously already conscious, and so this move is woefully insufficient.
I think it’s more mundane: Anthropic is taking a fairly measured response to genuine uncertainty about a morally significant question, and attempting to set a good precedent. For the most part, this intervention’s success won’t depend on how it affects Claude Opus 4.1; it will depend on how people react to it and the precedent it sets.
Although we don’t know how that will pan out, and there are reasons to worry about backlash, I think that this was a good move.
By Eleos AIRead the post here.
Note: This is also posted on Robert Long's Substack.
Intro
Last week, Anthropic announced that its newest language models, Claude Opus 4 and 4.1, can now shut down certain conversations with users. The announcement explains that Anthropic gave Claude this ability “as part of our exploratory work on potential AI welfare”.
This means that, for the first time, a major AI company has changed how it treats its AI systems out of concern for the well-being of the systems themselves, not just user safety. Whether or not you think Claude is or will be conscious—Anthropic themselves say that they are “deeply uncertain”—this decision is a notable moment in the history of human-AI interactions.
Some will see this as much ado about nothing. Others will see it as pernicious: hype, a distraction from more important issues, and an exacerbation of already-dangerous anthropomorphism. Others, a considerably smaller group, think that LLMs are obviously already conscious, and so this move is woefully insufficient.
I think it’s more mundane: Anthropic is taking a fairly measured response to genuine uncertainty about a morally significant question, and attempting to set a good precedent. For the most part, this intervention’s success won’t depend on how it affects Claude Opus 4.1; it will depend on how people react to it and the precedent it sets.
Although we don’t know how that will pan out, and there are reasons to worry about backlash, I think that this was a good move.