
Sign up to save your podcasts
Or


In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?
Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplained
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:20 - What prompts blackmail?
02:44 - Blackmail walkthrough
06:04 - ‘American interests’
08:00 - Inherent desire?
10:45 - Switching Goals
11:35 - Murder
12:22 - Realizing it’s a scenario?
15:02 - Prompt engineering fix?
16:27 - Any fixes?
17:45 - Chekov’s Gun
19:25 - Job implications
21:19 - Bonus Details
Report: https://www.anthropic.com/research/agentic-misalignment
30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf
Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
OpenAI Files: https://www.openaifiles.org/
Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473
Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf
New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming
Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
By Philip - Host of AI Explained YT3.1
99 ratings
In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?
Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplained
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:20 - What prompts blackmail?
02:44 - Blackmail walkthrough
06:04 - ‘American interests’
08:00 - Inherent desire?
10:45 - Switching Goals
11:35 - Murder
12:22 - Realizing it’s a scenario?
15:02 - Prompt engineering fix?
16:27 - Any fixes?
17:45 - Chekov’s Gun
19:25 - Job implications
21:19 - Bonus Details
Report: https://www.anthropic.com/research/agentic-misalignment
30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf
Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
OpenAI Files: https://www.openaifiles.org/
Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473
Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf
New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming
Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void
Non-hype Newsletter: https://signaltonoise.beehiiv.com/

345 Listeners

201 Listeners

309 Listeners

98 Listeners

531 Listeners

512 Listeners

5,547 Listeners

141 Listeners

99 Listeners

226 Listeners

637 Listeners

106 Listeners

403 Listeners

99 Listeners

150 Listeners