EA Forum Podcast (Curated & popular)

“Mitigating Risks from Rouge AI” by Stephen Clare


Listen Later

Introduction

Misaligned AI systems, which have a tendency to use their capabilities in ways that conflict with the intentions of both developers and users, could cause significant societal harm. Identifying them is seen as increasingly important to inform development and deployment decisions and design mitigation measures. There are concerns, however, that this will prove challenging. For example, misaligned AIs may only reveal harmful behaviors in rare circumstances, or perceive detection attempts as threatening and deploy countermeasures – including deception and sandbagging – to evade them.

For these reasons, developing a range of efforts to detect misaligned behavior, including power-seeking, deception, and sandbagging, among other capabilities, have been proposed. One important indicator, though, has been hiding in plain sight for years. In this post, we identify an underappreciated method that may be both necessary and sufficient to identify misaligned AIs: whether or not they've turned red, i.e. gone rouge.

In [...]

---

Outline:

(01:43) Historical Evidence for Rouge AI

(02:59) Recent Empirical Work

(05:18) Potential Countermeasure

(05:22) The EYES Eval

(06:27) EYES Eval Demonstration

(07:40) Future Research Directions

(08:42) Conclusion

---

First published:

April 1st, 2025

Source:

https://forum.effectivealtruism.org/posts/uKKoj9iqj2cWKsjrt/mitigating-risks-from-rouge-ai

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

EA Forum Podcast (Curated & popular)By EA Forum Team

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

8 ratings


More shows like EA Forum Podcast (Curated & popular)

View all
EconTalk by Russ Roberts

EconTalk

4,221 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

131 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

14,859 Listeners

80k After Hours by The 80000 Hours team

80k After Hours

14 Listeners

Lives Well Lived by Peter Singer & Kasia de Lazari Radek

Lives Well Lived

31 Listeners