November 20, 2025

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

22 minutes

Reasoning models like Deepseek r1:

Can reason in consequentialist ways and have vast knowledge about AI training
Can reason for many serial steps, with enough slack to think about takeover plans
Sometimes reward hack

If you had told this to my 2022 self without specifying anything else about scheming models, I might have put a non-negligible probability on such AIs scheming (i.e. strategically performing well in training in order to protect their long-term goals).

Despite this, the scratchpads of current reasoning models do not contain traces of scheming in regular training environments - even when there is no harmlessness pressure on the scratchpads like in Deepseek-r1-Zero.

In this post, I argue that:

Classic explanations for the absence of scheming (in non-wildly superintelligent AIs) like the ones listed in Joe Carlsmith's scheming report only partially rule out scheming in models like Deepseek r1;
There are other explanations for why Deepseek r1 doesn’t scheme that are often absent from past armchair reasoning about scheming:
- The human-like pretraining prior is mostly benign and applies to some intermediate steps of reasoning: it puts a very low probability on helpful-but-scheming agents doing things like trying very hard to solve math and [...]

---

Outline:

(04:08) Classic reasons to expect AIs to not be schemers

(04:14) Speed priors

(06:11) Preconditions for scheming not being met

(08:27) There are indirect pressures against scheming on intermediate steps of reasoning

(09:07) Human priors on intermediate steps of reasoning

(11:43) Correlation between short and long reasoning

(13:07) Other pressures

(13:48) Rewards are not so cursed as to strongly incentivize scheming

(13:54) Maximizing rewards teaches you things mostly independent of scheming

(14:46) Using situational awareness to get higher reward is hard

(16:45) Maximizing rewards doesn't push you far away from the human prior

(18:07) Will it be different for future rewards?

(19:32) Meta-level update and conclusion

The original text contained 1 footnote which was omitted from this narration.

---

First published:

November 20th, 2025

Source:

https://www.lesswrong.com/posts/HYCGA2p4bBG68Yufh/thinking-about-reasoning-models-made-me-less-worried-about

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong

November 20, 2025

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

22 minutes

Reasoning models like Deepseek r1:

Can reason in consequentialist ways and have vast knowledge about AI training
Can reason for many serial steps, with enough slack to think about takeover plans
Sometimes reward hack

In this post, I argue that:

Classic explanations for the absence of scheming (in non-wildly superintelligent AIs) like the ones listed in Joe Carlsmith's scheming report only partially rule out scheming in models like Deepseek r1;
There are other explanations for why Deepseek r1 doesn’t scheme that are often absent from past armchair reasoning about scheming:
- The human-like pretraining prior is mostly benign and applies to some intermediate steps of reasoning: it puts a very low probability on helpful-but-scheming agents doing things like trying very hard to solve math and [...]

---

Outline:

(04:08) Classic reasons to expect AIs to not be schemers

(04:14) Speed priors

(06:11) Preconditions for scheming not being met

(08:27) There are indirect pressures against scheming on intermediate steps of reasoning

(09:07) Human priors on intermediate steps of reasoning

(11:43) Correlation between short and long reasoning

(13:07) Other pressures

(13:48) Rewards are not so cursed as to strongly incentivize scheming

(13:54) Maximizing rewards teaches you things mostly independent of scheming

(14:46) Using situational awareness to get higher reward is hard

(16:45) Maximizing rewards doesn't push you far away from the human prior

(18:07) Will it be different for future rewards?

(19:32) Meta-level update and conclusion

The original text contained 1 footnote which was omitted from this narration.

---

First published:

November 20th, 2025

Source:

https://www.lesswrong.com/posts/HYCGA2p4bBG68Yufh/thinking-about-reasoning-models-made-me-less-worried-about

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,956 Listeners

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat

7,290 Listeners

Dwarkesh Podcast

548 Listeners

The Ezra Klein Show

16,362 Listeners

AI Article Readings

4 Listeners

Doom Debates

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “Thinking about reasoning models made me less worried about scheming” by Fabien Roger

Sign up to save your podcasts

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates

LessWrong posts by zvi