February 01, 2026

“An Explication of Alignment Optimism” by Oliver Daniels

2 minutes

Some people have been getting more optimistic about alignment. But from a skeptical / high p(doom) perspective, justifications for this optimism seem lacking.

"Claude is nice and can kinda do moral philosophy" just doesn't address the concern that lots of long horizon RL + self-reflection will lead to misaligned consequentialists (c.f. Hubinger)

So I think the casual alignment optimists aren't doing a great job of arguing their case. Still, it feels like there's an optimistic update somewhere in the current trajectory of AI development.

It really is kinda crazy how capable current models are, and how much I basically trust them. Paradoxically, most of this trust comes from lack of capabilities (current models couldn't seize power right now if they tried).

...and I think this is the positive update. It feels very plausible, in a visceral way, that the first economically transformative AI systems could be, in many ways, really dumb.

Slow takeoff implies that we'll get the stupidest possible transformative AI first. Moravec's paradox leads to a similar conclusion. Calling LLMs a "cultural technology" can be a form of AI denialism, but there's still an important truth there. If the secret [...]

---

First published:

January 31st, 2026

Source:

https://www.lesswrong.com/posts/RmsaYnHPBeagg8Giw/an-explication-of-alignment-optimism

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong