Doom Debates

OpenAI o3 and Claude Alignment Faking — How doomed are we?


Listen Later

OpenAI just announced o3 and smashed a bunch of benchmarks (ARC-AGI, SWE-bench, FrontierMath)!

A new Anthropic and Redwood Research paper says Claude is resisting its developers’ attempts to retrain its values!

What’s the upshot — what does it all mean for P(doom)?

00:00 Introduction

01:45 o3’s architecture and benchmarks

06:08 “Scaling is hitting a wall” 🤡

13:41 How many new architectural insights before AGI?

20:28 Negative update for interpretability

31:30 Intellidynamics — ***KEY CONCEPT***

33:20 Nuclear control rod analogy

36:54 Sam Altman's misguided perspective

42:40 Claude resisted retraining from good to evil

44:22 What is good corrigibility?

52:42 Claude’s incorrigibility doesn’t surprise me

55:00 Putting it all in perspective

---

SHOW NOTES

Scott Alexander’s analysis of the Claude incorrigibility result: https://www.astralcodexten.com/p/claude-fights-back and https://www.astralcodexten.com/p/why-worry-about-incorrigible-claude

Zvi Mowshowitz’s analysis of the Claude incorrigibility result: https://thezvi.wordpress.com/2024/12/24/ais-will-increasingly-fake-alignment/

---

PauseAI Website: https://pauseai.info

PauseAI Discord: https://discord.gg/2XXWXvErfA

Say hi to me in the #doom-debates-podcast channel!

Watch the Lethal Intelligence video and check out LethalIntelligence.ai! It’s an AWESOME new animated intro to AI risk.

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates



Get full access to Doom Debates at lironshapira.substack.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Doom DebatesBy Liron Shapira

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

14 ratings


More shows like Doom Debates

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,391 Listeners

EconTalk by Russ Roberts

EconTalk

4,292 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,463 Listeners

The Glenn Show by Glenn Loury

The Glenn Show

2,279 Listeners

The Comedy Cellar: Live from the Table by Comedy Cellar Network

The Comedy Cellar: Live from the Table

374 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,619 Listeners

Conversations with Coleman by The Free Press

Conversations with Coleman

592 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

98 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

555 Listeners

Call Me Back - with Dan Senor by Ark Media, Ilan Benatar

Call Me Back - with Dan Senor

3,246 Listeners

Honestly with Bari Weiss by The Free Press

Honestly with Bari Weiss

8,542 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

155 Listeners

For Humanity: An AI Risk Podcast by The AI Risk Network

For Humanity: An AI Risk Podcast

10 Listeners

Ask Haviv Anything by Haviv Rettig Gur

Ask Haviv Anything

878 Listeners

The Last Invention by Longview

The Last Invention

1,165 Listeners