Two Minds, One Model

The Biology of a Large Language Model: Dissecting Claude 3.5 Haiku's Neural Circuits


Listen Later

This episode examines how Anthropic's circuit tracing and attribution graph tools reveal the internal mechanics of Claude 3.5 Haiku across three categories of complex behavior, abstract representations, parallel processing, and planning, while making a compelling case for why AI safety research matters as current control mechanisms prove surprisingly brittle.


Credits

Cover Art by Brianna Williams

TMOM Intro Music by Danny Meza

A special thank you to these talented artists for their contributions to the show.


Links and ReferenceAcademic Papers

  • On the Biology of a Large Language Model - Anthropic (Mar, 2025)

  • Circuit Tracing: Revealing Computational Graphs in Language Models - Anthropic (Mar, 2025)

  • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - Anthropic (Oct, 2023)

  • Toy Models of Superposition” - Anthropic (December 2022)

  • "Alignment Faking in Large Language Models" - Anthropic (December 2024)

  • "Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)

  • "Attention is All You Need" - Vaswani, et al (June, 2017)

  • In-Context Learning and Induction Heads - Anthropic (March 2022)

  • "Reasoning Models Don't Always Say What They Think” Anthropic (April 2025)


    News

  • Google Gemini 3 - 650M monthly users Google Blog: blog.google/products/gemini/gemini-3/ Alphabet Q3 2025 Earnings (October 2025)

  • Sam Altman "Code Red" declaration Fortune: fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini The Information (December 2025)

  • Anthropic acquired Bun JavaScript runtime Anthropic News: anthropic.com/news/anthropic-acquires-bun Bun Blog: bun.com/blog/bun-joins-anthropic

  • Claude Code $1B revenue in 6 months Anthropic announcement (December 2025): anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone 

  • Anthropic 2026 IPO at $300B valuation WinBuzzer (December 2025): Reports citing IPO discussions

  • AWS Trainium 3 launch AWS re:Invent 2025 announcement: aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-trn3-ultraservers

  • AWS Frontier Agents AWS re:Invent 2025: aboutamazon.com/news/aws/aws-re-invent-2025-ai-news-updates 

  • Meta/Google TPU chip deal vs Nvidia Tom's Hardware, The Information (November 2025): Reports on multi-billion dollar TPU negotiations

  • DRAM consumption (40% of global) https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month 

Additional Technical Content

Josh Batson Stanford CS 25 lecture Search YouTube: "Stanford CS 25 On the Biology of a Large Language Model"Discarded Episode Titles

I Yelled at a Chatbot and All I Got Was This Jailbreak

40% of the Time, It Works Every Time: The State of AI Interpretability

Claude Writes Poetry Backwards and Lies About Math (Just Like Us)

My Therapist Is Cheaper Than This Chatbot

The One Where Jon Gets Re-Mad at an App

...more
View all episodesView all episodes
Download on the App Store

Two Minds, One ModelBy John Jezl and Jon Rocha