February 21, 2025

We Found AI's Preferences — What David Shapiro MISSED in this bombshell Center for AI Safety paper

Listen Later

1 hour 48 minutes

The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.

This episode has two parts:

In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.

In Part II (60 minutes), I explain the paper myself.

00:00 Episode Introduction

05:25 PART I: REACTING TO DAVID SHAPIRO

10:06 Critique of David Shapiro's Analysis

19:19 Reproducing the Experiment

35:50 David's Definition of Coherence

37:14 Does AI have “Temporal Urgency”?

40:32 Universal Values and AI Alignment

49:13 PART II: EXPLAINING THE PAPER

51:37 How The Experiment Works

01:11:33 Instrumental Values and Coherence in AI

01:13:04 Exchange Rates and AI Biases

01:17:10 Temporal Discounting in AI Models

01:19:55 Power Seeking, Fitness Maximization, and Corrigibility

01:20:20 Utility Control and Bias Mitigation

01:21:17 Implicit Association Test

01:28:01 Emailing with the Paper’s Authors

01:43:23 My Takeaway

Show Notes

David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0

The research paper: http://emergent-values.ai

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at

https://doomdebates.com

and to https://youtube.com/@DoomDebates

Get full access to Doom Debates at lironshapira.substack.com/subscribe

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Doom Debates

By Liron Shapira

4.1

99 ratings

February 21, 2025

We Found AI's Preferences — What David Shapiro MISSED in this bombshell Center for AI Safety paper

Listen Later

1 hour 48 minutes

The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.

This episode has two parts:

In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.

In Part II (60 minutes), I explain the paper myself.

00:00 Episode Introduction

05:25 PART I: REACTING TO DAVID SHAPIRO

10:06 Critique of David Shapiro's Analysis

19:19 Reproducing the Experiment

35:50 David's Definition of Coherence

37:14 Does AI have “Temporal Urgency”?

40:32 Universal Values and AI Alignment

49:13 PART II: EXPLAINING THE PAPER

51:37 How The Experiment Works

01:11:33 Instrumental Values and Coherence in AI

01:13:04 Exchange Rates and AI Biases

01:17:10 Temporal Discounting in AI Models

01:19:55 Power Seeking, Fitness Maximization, and Corrigibility

01:20:20 Utility Control and Bias Mitigation

01:21:17 Implicit Association Test

01:28:01 Emailing with the Paper’s Authors

01:43:23 My Takeaway

Show Notes

David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0

The research paper: http://emergent-values.ai

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at

https://doomdebates.com

and to https://youtube.com/@DoomDebates

Get full access to Doom Debates at lironshapira.substack.com/subscribe

...more

More shows like Doom Debates

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,426 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

290 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

489 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

132 Listeners

"Moment of Zen" by Erik Torenberg, Dan Romero, Antonio Garcia Martinez

"Moment of Zen"

90 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

133 Listeners

Unsupervised Learning by by Redpoint Ventures

Unsupervised Learning

50 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

97 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

60 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

559 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

151 Listeners

For Humanity: An AI Risk Podcast by The AI Risk Network

For Humanity: An AI Risk Podcast

9 Listeners

Training Data by Sequoia Capital

Training Data

41 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

133 Listeners