LessWrong (30+ Karma)

“Power-seeking agents will likely be developed” by Alec Harris


Listen Later

I am going to argue that we will likely eventually get AIs that are strongly power-seeking, much more so than current SOTA LLMs.[1]

TLDR

  1. Right now SOTA LLMs are still largely in a simulator regime. This buffers against power-seeking.
  2. Long-horizon RL or similar methods (applied to LLMs or otherwise) will turn AIs into consequentialists, motivating power-seeking.
  3. It will likely be difficult to prevent other actors from building consequentialist AI without leading labs being prepared to do so themselves.

Instrumental convergence does not apply to pretraining

LLM pretraining and SFT can be understood as creating a simulator. The model learns to imitate the continuation of the training distribution conditioned on the prompt. Note that a simulator, in this sense, does not optimize for simulation[2]; for example, it will not be inclined to harvest compute to improve its simulations. This is because simulators are consequence-blind: they don’t take into account the effects of their actions on the future. My favorite way to see this is that the gradients don’t flow through the conditional (the previous tokens), which is treated as a constant.

So even if altering the parameters would change the previous tokens and thereby improve the current prediction, the [...]

---

Outline:

(00:46) Instrumental convergence does not apply to pretraining

(02:28) Long-horizon optimization leads to consequentialism

(05:29) Consequentialism is useful

The original text contained 5 footnotes which were omitted from this narration.

---

First published:

May 20th, 2026

Source:

https://www.lesswrong.com/posts/CtnHpECuoq6eLL8fu/power-seeking-agents-will-likely-be-developed

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,330 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,247 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

563 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,328 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners