March 11, 2025

“We Have No Plan for Preventing Loss of Control in Open Models” by Andrew Dickson

Listen Later

39 minutes

Note: This post is intended to be the first in a broader series of posts about the difficult tradeoffs inherent in public access to powerful open source models. While this post highlights some dangers of open models and discusses the possibility of global regulation, I am not, in general, against open source AI, or supportive of regulation of open source AI today. On the contrary, I believe open source software is, in general, one of humanity's most important and valuable public goods. My goal in writing this post is to call attention to the risks and challenges around open models now, so we can use the time we still have before risks become extreme, to collectively explore viable alternatives to regulation, if indeed such alternatives exist.

Background

Most research into the control problem today starts from an assumption that the organization operating the AI system has some baseline interest [...]

---

Outline:

(00:52) Background

(02:01) AI Systems Without Control-Related Precautions

(05:02) Loss of Control in Open Models

(06:42) How Researchers Think About Loss of Control in Open Models

(09:00) Problems With Loss of Control is Mainly a Labs Problem

(09:05) Problem 1 - Frontier labs are increasingly secretive and profit-seeking and it's not clear they would publicly report a serious loss of control-related issue if they encountered one.

(10:40) Problem 2 - There is no agreed-upon standard that defines relevant thresholds or evidence that would constitute a serious control risk inside a lab anyway.

(12:38) Problem 3 - Even if one of the labs does sound the alarm, it seems likely that other labs will not stop releasing open models anyway, absent regulation.

(14:31) Problem 4 - Policymakers have not committed to regulate open models that demonstrate risky capabilities.

(16:33) Passing and Enforcing Effective Global Restrictions on Open Source Models Would be Extremely Difficult

(17:52) Challenge 1 - Regulations would need to be globally enforced to be effective.

(19:37) Challenge 2 - The required timelines for passing regulation and organizing global enforcement could be very short.

(20:51) Challenge 3 - If labs stop releasing open models, they may be leaked anyway.

(22:05) Challenge 4 - Penalties for possession would need to be severe and extreme levels of surveillance may be required to enforce them.

(25:13) The Urgency - DeepSeek and Evidence from Model Organisms and Agentic AI

(25:44) DeepSeek R1

(27:41) Evidence of Misalignment in Model Organisms

(28:22) Scheming

(29:48) Reward Tampering

(31:30) Broad Misalignment

(32:36) Susceptibility to Data Poisoning and Fine-tuning is Increasing

(33:33) Agentic AI

(36:20) Conclusion

---

First published:

March 10th, 2025

Source:

https://www.lesswrong.com/posts/QSyshep2CRs8JTPwK/we-have-no-plan-for-preventing-loss-of-control-in-open

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

March 11, 2025

“We Have No Plan for Preventing Loss of Control in Open Models” by Andrew Dickson

Listen Later

39 minutes

Note: This post is intended to be the first in a broader series of posts about the difficult tradeoffs inherent in public access to powerful open source models. While this post highlights some dangers of open models and discusses the possibility of global regulation, I am not, in general, against open source AI, or supportive of regulation of open source AI today. On the contrary, I believe open source software is, in general, one of humanity's most important and valuable public goods. My goal in writing this post is to call attention to the risks and challenges around open models now, so we can use the time we still have before risks become extreme, to collectively explore viable alternatives to regulation, if indeed such alternatives exist.

Background

Most research into the control problem today starts from an assumption that the organization operating the AI system has some baseline interest [...]

---

Outline:

(00:52) Background

(02:01) AI Systems Without Control-Related Precautions

(05:02) Loss of Control in Open Models

(06:42) How Researchers Think About Loss of Control in Open Models

(09:00) Problems With Loss of Control is Mainly a Labs Problem

(09:05) Problem 1 - Frontier labs are increasingly secretive and profit-seeking and it's not clear they would publicly report a serious loss of control-related issue if they encountered one.

(10:40) Problem 2 - There is no agreed-upon standard that defines relevant thresholds or evidence that would constitute a serious control risk inside a lab anyway.

(12:38) Problem 3 - Even if one of the labs does sound the alarm, it seems likely that other labs will not stop releasing open models anyway, absent regulation.

(14:31) Problem 4 - Policymakers have not committed to regulate open models that demonstrate risky capabilities.

(16:33) Passing and Enforcing Effective Global Restrictions on Open Source Models Would be Extremely Difficult

(17:52) Challenge 1 - Regulations would need to be globally enforced to be effective.

(19:37) Challenge 2 - The required timelines for passing regulation and organizing global enforcement could be very short.

(20:51) Challenge 3 - If labs stop releasing open models, they may be leaked anyway.

(22:05) Challenge 4 - Penalties for possession would need to be severe and extreme levels of surveillance may be required to enforce them.

(25:13) The Urgency - DeepSeek and Evidence from Model Organisms and Agentic AI

(25:44) DeepSeek R1

(27:41) Evidence of Misalignment in Model Organisms

(28:22) Scheming

(29:48) Reward Tampering

(31:30) Broad Misalignment

(32:36) Susceptibility to Data Poisoning and Fine-tuning is Increasing

(33:33) Agentic AI

(36:20) Conclusion

---

First published:

March 10th, 2025

Source:

https://www.lesswrong.com/posts/QSyshep2CRs8JTPwK/we-have-no-plan-for-preventing-loss-of-control-in-open

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,334 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,393 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,004 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,120 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

90 Listeners

Your Undivided Attention by Tristan Harris and Aza Raskin, The Center for Humane Technology

Your Undivided Attention

1,494 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,261 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

91 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

424 Listeners

Hard Fork by The New York Times

Hard Fork

5,448 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,457 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

506 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

127 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

71 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

466 Listeners