January 08, 2026

“My 2003 Post on the Evolutionary Argument for AI Misalignment” by Wei Dai

4 minutes

This was posted to SL4 on the last day of 2003. I had largely forgotten about it until I saw the LW Wiki reference it under Mesa Optimization[1]. Besides the reward hacking angle, which is now well-trodden, it gave an argument based on the relationship between philosophy, memetics, and alignment, which has been much less discussed (including in current discussions about human fertility decline), and perhaps still worth reading/thinking about. Overall, the post seems to have aged well, aside from the very last paragraph.

For historical context, Eliezer had coined "Friendly AI" in Creating Friendly AI 1.0 in June 2001. Although most of it was very hard to understand and subsequently disavowed by Eliezer himself, it had a section on “philosophical crisis”[2] which probably influenced a lot of my subsequent thinking including this post. What's now called The Sequences would start being posted to OB/LW in 2006.

Subsequent to 2003, I think this post mostly went unnoticed/forgotten (including by myself), and MIRI probably reinvented the idea of mesa-optimization/inner-misalignment circa 2016. I remember hearing people talk about inner vs outer alignment while attending a (mostly unrelated decision theory) workshop at MIRI and having an "oh this is new" reaction.

The [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

January 6th, 2026

Source:

https://www.lesswrong.com/posts/AjxwuLYvSCnf6KZMn/my-2003-post-on-the-evolutionary-argument-for-ai

---

Narrated by TYPE III AUDIO.

...more