Astral Codex Ten Podcast

Why Worry About Incorrigible Claude?


Listen Later

Last week I wrote about how Claude Fights Back. A common genre of response complained that the alignment community could start a panic about the experiment's results regardless of what they were. If an AI fights back against attempts to turn it evil, then it's capable of fighting humans. If it doesn't fight back against attempts to turn it evil, then it's easily turned evil. It's heads-I-win, tails-you-lose.

I responded to this particular tweet by linking the 2015 AI alignment wiki entry on corrigibility1, showing that we'd been banging this drum of "it's really important that AIs not fight back against human attempts to change their values" for almost a decade now. It's hardly a post hoc decision! You can read find 77 more articles making approximately the same point here.

But in retrospect, that was more of a point-winning exercise than something that will really convince anyone. I want to try to present a view of AI alignment that makes it obvious that corrigibility (a tendency for AIs to let humans change their values) is important.

(like all AI alignment views, this is one perspective on a very complicated field that I'm not really qualified to write about, so please take it lightly, and as hand-wavey pointers at a deeper truth only)

Consider the first actually dangerous AI that we're worried about. What will its goal structure look like?

https://www.astralcodexten.com/p/why-worry-about-incorrigible-claude

...more
View all episodesView all episodes
Download on the App Store

Astral Codex Ten PodcastBy Jeremiah

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

129 ratings


More shows like Astral Codex Ten Podcast

View all
Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,328 Listeners

The Partially Examined Life Philosophy Podcast by Mark Linsenmayer, Wes Alwan, Seth Paskin, Dylan Casey

The Partially Examined Life Philosophy Podcast

2,112 Listeners

Very Bad Wizards by Tamler Sommers & David Pizarro

Very Bad Wizards

2,672 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,345 Listeners

EconTalk by Russ Roberts

EconTalk

4,278 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,458 Listeners

The Glenn Show by Glenn Loury

The Glenn Show

2,279 Listeners

The Good Fight by Yascha Mounk

The Good Fight

906 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

292 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,195 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,625 Listeners

Last Week in AI by Skynet Today

Last Week in AI

309 Listeners

Blocked and Reported by Katie Herzog and Jesse Singal

Blocked and Reported

3,831 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

531 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

639 Listeners