November 08, 2025

“Anthropic & Dario’s dream” by Simon Lermen

9 minutes

Recently, Joe Carlsmith switched to work at Anthropic. He joins other members of the larger EA and Open Philanthropy ecosystem who are working at the AI lab, such as Holden Karnofsky. And of course many of the original founders were EA affiliated.

In short, I think Anthropic is honest and is attempting to be an ethical AI lab, but they are deeply mistaken about the difficulty they are facing and are dangerously affecting the AI safety ecosystem. My guess is that Anthropic for the most part is actually being internally honest and not consciously trying to deceive people. When they say they believe in being responsible, I think that's what they genuinely believe.

My criticism of Anthropic is based on them not having a promising plan and creating a dangerous counter-narrative to AI safety efforts. It's simply not enough to develop AI gradually, perform evaluations and do interpretability work to build safe superintelligence. With the methods we have, we're just not going to reach safe superintelligence. Gradual development (RSP) only has a small benefit—on a gradual scale, you may be able to see problems emerge, but it doesn't tell you how to solve them. The same goes for [...]

---

Outline:

(01:33) We only get one critical try to test our methods

(03:12) Anything close to current methods won't be enough

(05:44) Three Groups and the Counter-Narrative

(07:32) Will Anthropic give us evidence to stop?

---

First published:

November 8th, 2025

Source:

https://www.lesswrong.com/posts/axDdnzckDqSjmpitu/anthropic-and-dario-s-dream

---