muckrAIkers

The Co-opting of Safety


Listen Later

We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.

  • (00:00) - Intro
  • (00:21) - Mecha-Hitler Grok
  • (10:07) - "Safety"
  • (19:40) - Under-specification
  • (53:56) - This time isn't different
  • (01:01:46) - Alignment Tax myth
  • (01:17:37) - Actually making AI safer

  • Links
    • JMLR article - Underspecification Presents Challenges for Credibility in Modern Machine Learning
    • Trail of Bits paper - Towards Comprehensive Risk Assessments and Assurance of AI-Based Systems
    • SSRN paper - Uniqueness Bias: Why It Matters, How to Curb It

    Additional Referenced Papers

    • NeurIPS paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
    • ICML paper - AI Control: Improving Safety Despite Intentional Subversion
    • ICML paper - DarkBench: Benchmarking Dark Patterns in Large Language Models
    • OSF preprint - Current Real-World Use of Large Language Models for Mental Health
    • Anthropic preprint - Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Inciting Examples

    • ars Technica article - US government agency drops Grok after MechaHitler backlash, report says
    • The Guardian article - Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats
    • BBC article - Update that made ChatGPT 'dangerously' sycophantic pulled

    Other Sources

    • London Daily article - UK AI Safety Institute Rebrands as AI Security Institute to Focus on Crime and National Security
    • Vice article - Prominent AI Philosopher and ‘Father’ of Longtermism Sent Very Racist Email to a 90s Philosophy Listserv
    • LessWrong blogpost - "notkilleveryoneism" sounds dumb (see comments)
    • EA Forum blogpost - An Overview of the AI Safety Funding Situation
    • Book by Dmitry Chernov and Didier Sornette - Man-made Catastrophes and Risk Information Concealment
    • Euronews article - OpenAI adds mental health safeguards to ChatGPT, saying chatbot has fed into users’ ‘delusions’
    • Pleias website
    • Wikipedia page on Jaywalking
    ...more
    View all episodesView all episodes
    Download on the App Store

    muckrAIkersBy Jacob Haimes and Igor Krawczuk