LessWrong (30+ Karma)

“We Have No Plan for Preventing Loss of Control in Open Models” by Andrew Dickson


Listen Later

Note: This post is intended to be the first in a broader series of posts about the difficult tradeoffs inherent in public access to powerful open source models. While this post highlights some dangers of open models and discusses the possibility of global regulation, I am not, in general, against open source AI, or supportive of regulation of open source AI today. On the contrary, I believe open source software is, in general, one of humanity's most important and valuable public goods. My goal in writing this post is to call attention to the risks and challenges around open models now, so we can use the time we still have before risks become extreme, to collectively explore viable alternatives to regulation, if indeed such alternatives exist.

Background

Most research into the control problem today starts from an assumption that the organization operating the AI system has some baseline interest [...]

---

Outline:

(00:52) Background

(02:01) AI Systems Without Control-Related Precautions

(05:02) Loss of Control in Open Models

(06:42) How Researchers Think About Loss of Control in Open Models

(09:00) Problems With Loss of Control is Mainly a Labs Problem

(09:05) Problem 1 - Frontier labs are increasingly secretive and profit-seeking and it's not clear they would publicly report a serious loss of control-related issue if they encountered one.

(10:40) Problem 2 - There is no agreed-upon standard that defines relevant thresholds or evidence that would constitute a serious control risk inside a lab anyway.

(12:38) Problem 3 - Even if one of the labs does sound the alarm, it seems likely that other labs will not stop releasing open models anyway, absent regulation.

(14:31) Problem 4 - Policymakers have not committed to regulate open models that demonstrate risky capabilities.

(16:33) Passing and Enforcing Effective Global Restrictions on Open Source Models Would be Extremely Difficult

(17:52) Challenge 1 - Regulations would need to be globally enforced to be effective.

(19:37) Challenge 2 - The required timelines for passing regulation and organizing global enforcement could be very short.

(20:51) Challenge 3 - If labs stop releasing open models, they may be leaked anyway.

(22:05) Challenge 4 - Penalties for possession would need to be severe and extreme levels of surveillance may be required to enforce them.

(25:13) The Urgency - DeepSeek and Evidence from Model Organisms and Agentic AI

(25:44) DeepSeek R1

(27:41) Evidence of Misalignment in Model Organisms

(28:22) Scheming

(29:48) Reward Tampering

(31:30) Broad Misalignment

(32:36) Susceptibility to Data Poisoning and Fine-tuning is Increasing

(33:33) Agentic AI

(36:20) Conclusion

---

First published:

March 10th, 2025

Source:

https://www.lesswrong.com/posts/QSyshep2CRs8JTPwK/we-have-no-plan-for-preventing-loss-of-control-in-open

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

113,323 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,263 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

566 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,490 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners