LessWrong (30+ Karma)

“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck


Listen Later

Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask.

Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with an almost arbitrary long-term goal.

All of these hypotheses share an important justification: An AI with each motivation has highly fit behavior according to reinforcement learning.

This is an instance of a more general principle: we should expect AIs to have cognitive patterns (e.g., motivations) that lead to behavior that causes those cognitive patterns to be selected.

In this post I’ll spell out what this more general principle means and why it's helpful. Specifically:

  • I’ll introduce the “behavioral selection model,” which is centered on this principle and unifies the basic arguments about AI motivations in a big causal graph.
  • I’ll discuss the basic implications for AI motivations.
  • And then I’ll discuss some important extensions and omissions of the behavioral selection model.

This [...]

---

Outline:

(02:13) How does the behavioral selection model predict AI behavior?

(05:18) The causal graph

(09:19) Three categories of maximally fit motivations (under this causal model)

(09:40) 1. Fitness-seekers, including reward-seekers

(11:42) 2. Schemers

(14:02) 3. Optimal kludges of motivations

(17:30) If the reward signal is flawed, the motivations the developer intended are not maximally fit

(19:50) The (implicit) prior over cognitive patterns

(24:07) Corrections to the basic model

(24:22) Developer iteration

(27:00) Imperfect situational awareness and planning from the AI

(28:40) Conclusion

(31:28) Appendix: Important extensions

(31:33) Process-based supervision

(33:04) White-box selection of cognitive patterns

(34:34) Cultural selection of memes

The original text contained 21 footnotes which were omitted from this narration.

---

First published:

December 4th, 2025

Source:

https://www.lesswrong.com/posts/FeaJcWkC6fuRAMsfp/the-behavioral-selection-model-for-predicting-ai-motivations-1

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,370 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,450 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,708 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,174 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

93 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,599 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,855 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

93 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

507 Listeners

Hard Fork by The New York Times

Hard Fork

5,529 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,019 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

543 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

136 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

94 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

475 Listeners