August 21, 2025

The Co-opting of Safety

1 hour 24 minutes

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.

(00:00) - Intro

(00:21) - Mecha-Hitler Grok

(10:07) - "Safety"

(19:40) - Under-specification

(53:56) - This time isn't different

(01:01:46) - Alignment Tax myth

(01:17:37) - Actually making AI safer

Links

JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It

Additional Referenced Papers

NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
ICML paper - AI Control: Improving Safety Despite Intentional Subversion
ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
OSF preprint - Current Real-World Use of Large Language Models for Mental Health
Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Inciting Examples

ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled

Other Sources

London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
EA Forum blogpost - An Overview of the AI Safety Funding Situation
Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
Pleias website
Wikipedia page on Jaywalking

...more

View all episodes

By Jacob Haimes and Igor Krawczuk

August 21, 2025

The Co-opting of Safety

1 hour 24 minutes

(00:00) - Intro

(00:21) - Mecha-Hitler Grok

(10:07) - "Safety"

(19:40) - Under-specification

(53:56) - This time isn't different

(01:01:46) - Alignment Tax myth

(01:17:37) - Actually making AI safer

Links

JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It

Additional Referenced Papers

NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
ICML paper - AI Control: Improving Safety Despite Intentional Subversion
ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
OSF preprint - Current Real-World Use of Large Language Models for Mental Health
Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Inciting Examples

ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled

Other Sources

London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
EA Forum blogpost - An Overview of the AI Safety Funding Situation
Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
Pleias website
Wikipedia page on Jaywalking

...more

Share The Co-opting of Safety

Sign up to save your podcasts

The Co-opting of Safety

The Co-opting of Safety