
Sign up to save your podcasts
Or


Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask.
Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with an almost arbitrary long-term goal.
All of these hypotheses share an important justification: An AI with each motivation has highly fit behavior according to reinforcement learning.
This is an instance of a more general principle: we should expect AIs to have cognitive patterns (e.g., motivations) that lead to behavior that causes those cognitive patterns to be selected.
In this post I’ll spell out what this more general principle means and why it's helpful. Specifically:
This [...]
---
Outline:
(02:13) How does the behavioral selection model predict AI behavior?
(05:18) The causal graph
(09:19) Three categories of maximally fit motivations (under this causal model)
(09:40) 1. Fitness-seekers, including reward-seekers
(11:42) 2. Schemers
(14:02) 3. Optimal kludges of motivations
(17:30) If the reward signal is flawed, the motivations the developer intended are not maximally fit
(19:50) The (implicit) prior over cognitive patterns
(24:07) Corrections to the basic model
(24:22) Developer iteration
(27:00) Imperfect situational awareness and planning from the AI
(28:40) Conclusion
(31:28) Appendix: Important extensions
(31:33) Process-based supervision
(33:04) White-box selection of cognitive patterns
(34:34) Cultural selection of memes
The original text contained 21 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongHighly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask.
Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with an almost arbitrary long-term goal.
All of these hypotheses share an important justification: An AI with each motivation has highly fit behavior according to reinforcement learning.
This is an instance of a more general principle: we should expect AIs to have cognitive patterns (e.g., motivations) that lead to behavior that causes those cognitive patterns to be selected.
In this post I’ll spell out what this more general principle means and why it's helpful. Specifically:
This [...]
---
Outline:
(02:13) How does the behavioral selection model predict AI behavior?
(05:18) The causal graph
(09:19) Three categories of maximally fit motivations (under this causal model)
(09:40) 1. Fitness-seekers, including reward-seekers
(11:42) 2. Schemers
(14:02) 3. Optimal kludges of motivations
(17:30) If the reward signal is flawed, the motivations the developer intended are not maximally fit
(19:50) The (implicit) prior over cognitive patterns
(24:07) Corrections to the basic model
(24:22) Developer iteration
(27:00) Imperfect situational awareness and planning from the AI
(28:40) Conclusion
(31:28) Appendix: Important extensions
(31:33) Process-based supervision
(33:04) White-box selection of cognitive patterns
(34:34) Cultural selection of memes
The original text contained 21 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,370 Listeners

2,450 Listeners

8,708 Listeners

4,174 Listeners

93 Listeners

1,599 Listeners

9,855 Listeners

93 Listeners

507 Listeners

5,529 Listeners

16,019 Listeners

543 Listeners

136 Listeners

94 Listeners

475 Listeners