
Sign up to save your podcasts
Or
The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.
This episode has two parts:
In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.
In Part II (60 minutes), I explain the paper myself.
00:00 Episode Introduction
05:25 PART I: REACTING TO DAVID SHAPIRO
10:06 Critique of David Shapiro's Analysis
19:19 Reproducing the Experiment
35:50 David's Definition of Coherence
37:14 Does AI have “Temporal Urgency”?
40:32 Universal Values and AI Alignment
49:13 PART II: EXPLAINING THE PAPER
51:37 How The Experiment Works
01:11:33 Instrumental Values and Coherence in AI
01:13:04 Exchange Rates and AI Biases
01:17:10 Temporal Discounting in AI Models
01:19:55 Power Seeking, Fitness Maximization, and Corrigibility
01:20:20 Utility Control and Bias Mitigation
01:21:17 Implicit Association Test
01:28:01 Emailing with the Paper’s Authors
01:43:23 My Takeaway
Show Notes
David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0
The research paper: http://emergent-values.ai
Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence
PauseAI, the volunteer organization I’m part of: https://pauseai.info
Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!
Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.
Support the mission by subscribing to my Substack at
https://doomdebates.com
and to https://youtube.com/@DoomDebates
4.3
66 ratings
The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.
This episode has two parts:
In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.
In Part II (60 minutes), I explain the paper myself.
00:00 Episode Introduction
05:25 PART I: REACTING TO DAVID SHAPIRO
10:06 Critique of David Shapiro's Analysis
19:19 Reproducing the Experiment
35:50 David's Definition of Coherence
37:14 Does AI have “Temporal Urgency”?
40:32 Universal Values and AI Alignment
49:13 PART II: EXPLAINING THE PAPER
51:37 How The Experiment Works
01:11:33 Instrumental Values and Coherence in AI
01:13:04 Exchange Rates and AI Biases
01:17:10 Temporal Discounting in AI Models
01:19:55 Power Seeking, Fitness Maximization, and Corrigibility
01:20:20 Utility Control and Bias Mitigation
01:21:17 Implicit Association Test
01:28:01 Emailing with the Paper’s Authors
01:43:23 My Takeaway
Show Notes
David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0
The research paper: http://emergent-values.ai
Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence
PauseAI, the volunteer organization I’m part of: https://pauseai.info
Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!
Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.
Support the mission by subscribing to my Substack at
https://doomdebates.com
and to https://youtube.com/@DoomDebates
582 Listeners
2,401 Listeners
87 Listeners
247 Listeners
8,780 Listeners
89 Listeners
357 Listeners
132 Listeners
90 Listeners
125 Listeners
64 Listeners
62 Listeners
135 Listeners
433 Listeners
116 Listeners