
Sign up to save your podcasts
Or


This episode examines how Anthropic's circuit tracing and attribution graph tools reveal the internal mechanics of Claude 3.5 Haiku across three categories of complex behavior, abstract representations, parallel processing, and planning, while making a compelling case for why AI safety research matters as current control mechanisms prove surprisingly brittle.
Credits
Cover Art by Brianna Williams
TMOM Intro Music by Danny Meza
A special thank you to these talented artists for their contributions to the show.
Links and ReferenceAcademic Papers
On the Biology of a Large Language Model - Anthropic (Mar, 2025)
Circuit Tracing: Revealing Computational Graphs in Language Models - Anthropic (Mar, 2025)
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - Anthropic (Oct, 2023)
“Toy Models of Superposition” - Anthropic (December 2022)
"Alignment Faking in Large Language Models" - Anthropic (December 2024)
"Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)
"Attention is All You Need" - Vaswani, et al (June, 2017)
In-Context Learning and Induction Heads - Anthropic (March 2022)
"Reasoning Models Don't Always Say What They Think” Anthropic (April 2025)
News
Google Gemini 3 - 650M monthly users Google Blog: blog.google/products/gemini/gemini-3/ Alphabet Q3 2025 Earnings (October 2025)
Sam Altman "Code Red" declaration Fortune: fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini The Information (December 2025)
Anthropic acquired Bun JavaScript runtime Anthropic News: anthropic.com/news/anthropic-acquires-bun Bun Blog: bun.com/blog/bun-joins-anthropic
Claude Code $1B revenue in 6 months Anthropic announcement (December 2025): anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone
Anthropic 2026 IPO at $300B valuation WinBuzzer (December 2025): Reports citing IPO discussions
AWS Trainium 3 launch AWS re:Invent 2025 announcement: aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-trn3-ultraservers
AWS Frontier Agents AWS re:Invent 2025: aboutamazon.com/news/aws/aws-re-invent-2025-ai-news-updates
Meta/Google TPU chip deal vs Nvidia Tom's Hardware, The Information (November 2025): Reports on multi-billion dollar TPU negotiations
DRAM consumption (40% of global) https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month
Additional Technical Content
Josh Batson Stanford CS 25 lecture Search YouTube: "Stanford CS 25 On the Biology of a Large Language Model"Discarded Episode Titles
I Yelled at a Chatbot and All I Got Was This Jailbreak
40% of the Time, It Works Every Time: The State of AI Interpretability
Claude Writes Poetry Backwards and Lies About Math (Just Like Us)
My Therapist Is Cheaper Than This Chatbot
The One Where Jon Gets Re-Mad at an App
By John Jezl and Jon RochaThis episode examines how Anthropic's circuit tracing and attribution graph tools reveal the internal mechanics of Claude 3.5 Haiku across three categories of complex behavior, abstract representations, parallel processing, and planning, while making a compelling case for why AI safety research matters as current control mechanisms prove surprisingly brittle.
Credits
Cover Art by Brianna Williams
TMOM Intro Music by Danny Meza
A special thank you to these talented artists for their contributions to the show.
Links and ReferenceAcademic Papers
On the Biology of a Large Language Model - Anthropic (Mar, 2025)
Circuit Tracing: Revealing Computational Graphs in Language Models - Anthropic (Mar, 2025)
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - Anthropic (Oct, 2023)
“Toy Models of Superposition” - Anthropic (December 2022)
"Alignment Faking in Large Language Models" - Anthropic (December 2024)
"Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)
"Attention is All You Need" - Vaswani, et al (June, 2017)
In-Context Learning and Induction Heads - Anthropic (March 2022)
"Reasoning Models Don't Always Say What They Think” Anthropic (April 2025)
News
Google Gemini 3 - 650M monthly users Google Blog: blog.google/products/gemini/gemini-3/ Alphabet Q3 2025 Earnings (October 2025)
Sam Altman "Code Red" declaration Fortune: fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini The Information (December 2025)
Anthropic acquired Bun JavaScript runtime Anthropic News: anthropic.com/news/anthropic-acquires-bun Bun Blog: bun.com/blog/bun-joins-anthropic
Claude Code $1B revenue in 6 months Anthropic announcement (December 2025): anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone
Anthropic 2026 IPO at $300B valuation WinBuzzer (December 2025): Reports citing IPO discussions
AWS Trainium 3 launch AWS re:Invent 2025 announcement: aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-trn3-ultraservers
AWS Frontier Agents AWS re:Invent 2025: aboutamazon.com/news/aws/aws-re-invent-2025-ai-news-updates
Meta/Google TPU chip deal vs Nvidia Tom's Hardware, The Information (November 2025): Reports on multi-billion dollar TPU negotiations
DRAM consumption (40% of global) https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month
Additional Technical Content
Josh Batson Stanford CS 25 lecture Search YouTube: "Stanford CS 25 On the Biology of a Large Language Model"Discarded Episode Titles
I Yelled at a Chatbot and All I Got Was This Jailbreak
40% of the Time, It Works Every Time: The State of AI Interpretability
Claude Writes Poetry Backwards and Lies About Math (Just Like Us)
My Therapist Is Cheaper Than This Chatbot
The One Where Jon Gets Re-Mad at an App