April 02, 2025

“Is there instrumental convergence for virtues?” by mattmacdermott

2 minutes

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".

If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.

For idealised pure consequentialists -- agents that have an outcome they want to bring about, and do whatever they think will cause it -- some version of instrumental convergence seems surely true[1].

But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do we still have to worry that unless such AIs are motivated [...]

The original text contained 1 footnote which was omitted from this narration.

---

First published:

April 2nd, 2025

Source:

https://www.lesswrong.com/posts/6Syt5smFiHfxfii4p/is-there-instrumental-convergence-for-virtues

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong