LessWrong (30+ Karma)

“Optimisation over non-stationary distributions creates weirder minds” by Samuel Ratnam, Pjain


Listen Later

TLDR: Sequentially mixing training objectives incentivises different training dynamics depending on the distinguishability of the training environments and the amount of pressure for shared circuitry. We classify these patterns into three classes: ecological generalists, conditional policies, and strategy churn. We suggest careful consideration of the pressures of training dynamics can allow us to shape the minds of AI systems in more intentional and fine-grained ways.

Modern LLM post-training involves interleaving many different training objectives such as mixing different reward functions with supervised fine tuning (SFT), typically in order to guard against catastrophic forgetting. However, a lot of existing theoretical work in AI safety operates under the assumption of a fixed training objective and distribution. What happens when we drop that assumption?

A common intuition is that mixing training objectives merely selects for an optimiser over a weighted sum of these training objectives, so this problem nicely reduces down to single objective optimisation. On the other hand, you might suspect that the most recent objective is the only thing that matters for reasoning about a system's properties. In practice, things are not quite so simple.

Intensive optimisation over any given distribution creates fragility due to Goodhart's law. A circuit that [...]

The original text contained 3 footnotes which were omitted from this narration.

---

First published:

June 5th, 2026

Source:

https://www.lesswrong.com/posts/bzkQgYhaCcpvKqmAK/optimisation-over-non-stationary-distributions-creates

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,279 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,248 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,340 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners