LessWrong (30+ Karma)

“Control protocols don’t always need to know which models are scheming” by Fabien Roger


Listen Later

These are my personal views.

To detect if an agent is taking a catastrophically dangerous action, you might want to monitor its actions using the smartest model that is too weak to be a schemer. But knowing what models are weak enough that they are unlikely to scheme is difficult, which puts you in a difficult spot: take a model too strong and it might actually be a schemer and lie to you, take a model too weak and it might fail just because it's too dumb to notice the danger. So instead you can just use all models as monitors at once and block actions if any of the monitors flag it as dangerous[1].

In some situations, this lets you build protocols that you can implement without knowing which models are scheming and that get the same level of safety as if you perfectly knew which models were scheming (though it doesn’t let you know what level of safety you actually have).

This post presents some toy models illustrating the benefits of ensembling monitors of various trustedness and explains what consequences it might have in practice to be using protocols that are somewhat agnostic about which of the [...]

---

Outline:

(01:30) Toy models

(01:33) The n=2 case

(03:52) Continuum of models

(04:50) Ensembling monitors in practice

The original text contained 1 footnote which was omitted from this narration.

---

First published:

April 26th, 2026

Source:

https://www.lesswrong.com/posts/WiYDawNhFp5cM7uqF/control-protocols-don-t-always-need-to-know-which-models-are

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,347 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,244 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

560 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,327 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners