AI Explained Official Podcast

When Will AI Models Blackmail You, and Why?


Listen Later

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?

Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplained


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:20 - What prompts blackmail?
02:44 - Blackmail walkthrough 
06:04 - ‘American interests’
08:00 - Inherent desire?
10:45 - Switching Goals
11:35 - Murder
12:22 - Realizing it’s a scenario? 
15:02 - Prompt engineering fix?
16:27 - Any fixes?
17:45 - Chekov’s Gun
19:25 - Job implications
21:19 - Bonus Details

Report: https://www.anthropic.com/research/agentic-misalignment
30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf
Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
OpenAI Files: https://www.openaifiles.org/
Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473
Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf
New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming
Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

...more
View all episodesView all episodes
Download on the App Store

AI Explained Official PodcastBy Philip - Host of AI Explained YT

  • 3.1
  • 3.1
  • 3.1
  • 3.1
  • 3.1

3.1

9 ratings


More shows like AI Explained Official Podcast

View all
Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

303 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

333 Listeners

Practical AI by Practical AI LLC

Practical AI

209 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

200 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

93 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

507 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

136 Listeners

AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning

152 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

209 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

595 Listeners

AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

AI For Humans: Making Artificial Intelligence Fun & Practical

270 Listeners

AI and I by Dan Shipper

AI and I

36 Listeners

The Next Wave - AI and The Future of Technology by Mindstream (Hubspot Media)

The Next Wave - AI and The Future of Technology

60 Listeners

AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Gemini, OpenAI, Anthropic by Jaeden Schafer and Conor Grennan

AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Gemini, OpenAI, Anthropic

146 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

52 Listeners