February 16, 2026

“Aligning to Virtues” by Richard_Ngo

7 minutes

Which alignment target?

Suppose you’re a lab or government, and you want to figure out what values to align your AI to. Here are three options, and some of their downsides:

AIs that are aligned to a set of consequentialist values are incentivized to acquire power to pursue those values. This creates power struggles between those AIs and:

Humans who don’t share those values.
Humans who disagree with the AI about how to pursue those values.
Humans who don’t trust that the AI will actually pursue its stated values after gaining power.

This is true whether those values are misaligned with all humans, aligned with some humans, chosen by aggregating all humans’ values, or an attempt to specify some “moral truth”. In general, since humans have many different values, I think of the power struggle as being between coalitions which each contain some humans and some AIs.

AIs that are aligned to a set of deontological principles (like refusing to harm humans) are safer, but also less flexible. What's fine for an AI to do in one context might be harmful in another context; what's fine for one AI to do might be very harmful for a million [...]

---

Outline:

(00:09) Which alignment target?

(02:37) Aligning to virtues

---

First published:

February 16th, 2026

Source:

https://www.lesswrong.com/posts/5CZoEw7sjxnMrhgvx/aligning-to-virtues

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong