February 06, 2025

Toy Model of the AI Control Problem

Listen Later

25 minutes

Why does the simplest AI imaginable, when you ask it to help you push a box around a grid, suddenly want you to die?

AI doomers are often misconstrued as having "no evidence" or just "anthropomorphizing". This toy model will help you understand why a drive to eliminate humans is NOT a handwavy anthropomorphic speculation, but rather something we expect by default from any sufficiently powerful search algorithm.

We’re not talking about AGI or ASI here — we’re just looking at an AI that does brute-force search over actions in a simple grid world.

The slide deck I’m presenting was created by Jaan Tallinn, cofounder of the Future of Life Institute.

00:00 Introduction

01:24 The Toy Model

06:19 Misalignment and Manipulation Drives

12:57 Search Capacity and Ontological Insights

16:33 Irrelevant Concepts in AI Control

20:14 Approaches to Solving AI Control Problems

23:38 Final Thoughts

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates

Get full access to Doom Debates at lironshapira.substack.com/subscribe

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Doom Debates

By Liron Shapira

4.3

1414 ratings

February 06, 2025

Toy Model of the AI Control Problem

Listen Later

25 minutes

Why does the simplest AI imaginable, when you ask it to help you push a box around a grid, suddenly want you to die?

AI doomers are often misconstrued as having "no evidence" or just "anthropomorphizing". This toy model will help you understand why a drive to eliminate humans is NOT a handwavy anthropomorphic speculation, but rather something we expect by default from any sufficiently powerful search algorithm.

We’re not talking about AGI or ASI here — we’re just looking at an AI that does brute-force search over actions in a simple grid world.

The slide deck I’m presenting was created by Jaan Tallinn, cofounder of the Future of Life Institute.

00:00 Introduction

01:24 The Toy Model

06:19 Misalignment and Manipulation Drives

12:57 Search Capacity and Ontological Insights

16:33 Irrelevant Concepts in AI Control

20:14 Approaches to Solving AI Control Problems

23:38 Final Thoughts

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates

Get full access to Doom Debates at lironshapira.substack.com/subscribe

...more

More shows like Doom Debates

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,332 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,453 Listeners

Robert Wright's Nonzero by Nonzero

Robert Wright's Nonzero

593 Listeners

The Michael Shermer Show by Michael Shermer

The Michael Shermer Show

935 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,183 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,598 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

511 Listeners

Theories of Everything with Curt Jaimungal by Theories of Everything

Theories of Everything with Curt Jaimungal

24 Listeners

Razib Khan's Unsupervised Learning by Razib Khan

Razib Khan's Unsupervised Learning

208 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

227 Listeners

Robinson's Podcast by Robinson Erhardt

Robinson's Podcast

265 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

610 Listeners

The Last Invention by Longview

The Last Invention

1,086 Listeners