March 04, 2025

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

Listen Later

9 minutes

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research.

If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:

Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it.
Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.

In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]

---

Outline:

(01:07) Techniques

(04:41) Reasons to prioritize detection over prevention

---

First published:

March 4th, 2025

Source:

https://www.lesswrong.com/posts/bAWPsgbmtLf8ptay6/for-scheming-we-should-first-focus-on-detection-and-then-on

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

March 04, 2025

“For scheming, we should first focus on detection and then on prevention” by Marius Hobbhahn

Listen Later

9 minutes

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research.

If we want to argue that the risk of harm from scheming in an AI system is low, we could, among others, make the following arguments:

Detection: If our AI system is scheming, we have good reasons to believe that we would be able to detect it.
Prevention: We have good reasons to believe that our AI system has a low scheming propensity or that we could stop scheming actions before they cause harm.

In this brief post, I argue why we should first prioritize detection over prevention, assuming you cannot pursue both at the same time, e.g. due to limited resources. In short, a) early on, the information value is more important than risk reduction because current models are unlikely to cause big harm but we can already learn a lot [...]

---

Outline:

(01:07) Techniques

(04:41) Reasons to prioritize detection over prevention

---

First published:

March 4th, 2025

Source:

https://www.lesswrong.com/posts/bAWPsgbmtLf8ptay6/for-scheming-we-should-first-focus-on-detection-and-then-on

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

111,948 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,230 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

576 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,950 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners