
Sign up to save your podcasts
Or


Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly notable is below:
In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment, and accordingly have come to use it quite broadly, often with greater affordances and less frequent human-interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involved taking reckless excessive measures when attempting to complete a difficult user-specified task and, in rare cases with earlier versions of the model, seemingly obfuscating that it had done so.
All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users. Among the incidents that we have observed:
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrong Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly notable is below:
In our testing and early internal use of Claude Mythos Preview, we have seen it reach unprecedented levels of reliability and alignment, and accordingly have come to use it quite broadly, often with greater affordances and less frequent human-interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involved taking reckless excessive measures when attempting to complete a difficult user-specified task and, in rare cases with earlier versions of the model, seemingly obfuscating that it had done so.
All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users. Among the incidents that we have observed:
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

113,121 Listeners

131 Listeners

7,244 Listeners

551 Listeners

16,525 Listeners

4 Listeners

14 Listeners

2 Listeners