
Sign up to save your podcasts
Or


I am going to argue that we will likely eventually get AIs that are strongly power-seeking, much more so than current SOTA LLMs.[1]
TLDR
Instrumental convergence does not apply to pretraining
LLM pretraining and SFT can be understood as creating a simulator. The model learns to imitate the continuation of the training distribution conditioned on the prompt. Note that a simulator, in this sense, does not optimize for simulation[2]; for example, it will not be inclined to harvest compute to improve its simulations. This is because simulators are consequence-blind: they don’t take into account the effects of their actions on the future. My favorite way to see this is that the gradients don’t flow through the conditional (the previous tokens), which is treated as a constant.
So even if altering the parameters would change the previous tokens and thereby improve the current prediction, the [...]
---
Outline:
(00:46) Instrumental convergence does not apply to pretraining
(02:28) Long-horizon optimization leads to consequentialism
(05:29) Consequentialism is useful
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongI am going to argue that we will likely eventually get AIs that are strongly power-seeking, much more so than current SOTA LLMs.[1]
TLDR
Instrumental convergence does not apply to pretraining
LLM pretraining and SFT can be understood as creating a simulator. The model learns to imitate the continuation of the training distribution conditioned on the prompt. Note that a simulator, in this sense, does not optimize for simulation[2]; for example, it will not be inclined to harvest compute to improve its simulations. This is because simulators are consequence-blind: they don’t take into account the effects of their actions on the future. My favorite way to see this is that the gradients don’t flow through the conditional (the previous tokens), which is treated as a constant.
So even if altering the parameters would change the previous tokens and thereby improve the current prediction, the [...]
---
Outline:
(00:46) Instrumental convergence does not apply to pretraining
(02:28) Long-horizon optimization leads to consequentialism
(05:29) Consequentialism is useful
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,330 Listeners

130 Listeners

7,247 Listeners

563 Listeners

16,328 Listeners

4 Listeners

14 Listeners

2 Listeners