Latent Space: The AI Engineer Podcast

The Utility of Interpretability — Emmanuel Amiesen


Listen Later

Emmanuel Amiesen is lead author of “Circuit Tracing: Revealing Computational Graphs in Language Models” (https://transformer-circuits.pub/2025/attribution-graphs/methods.html ), which is part of a duo of MechInterp papers that Anthropic published in March (alongside https://transformer-circuits.pub/2025/attribution-graphs/biology.html ).

We recorded the initial conversation a month ago, but then held off publishing until the open source tooling for the graph generation discussed in this work was released last week: https://www.anthropic.com/research/open-source-circuit-tracing

This is a 2 part episode - an intro covering the open source release, then a deeper dive into the paper — with guest host Vibhu Sapra (https://x.com/vibhuuuus ) and Mochi the MechInterp Pomsky (https://x.com/mochipomsky ). Thanks to Vibhu for making this episode happen!

While the original blogpost contained some fantastic guided visualizations (which we discuss at the end of this pod!), with the notebook and Neuronpedia visualization (https://www.neuronpedia.org/gemma-2-2b/graph ) released this week, you can now explore on your own with Neuronpedia, as we show you in the video version of this pod.

Full Video Episode

Timestamps

00:00 Intro & Guest Introductions01:00 Anthropic's Circuit Tracing Release06:11 Exploring Circuit Tracing Tools & Demos13:01 Model Behaviors and User Experiments17:02 Behind the Research: Team and Community24:19 Main Episode Start: Mech Interp Backgrounds25:56 Getting Into Mech Interp Research31:52 History and Foundations of Mech Interp37:05 Core Concepts: Superposition & Features39:54 Applications & Interventions in Models45:59 Challenges & Open Questions in Interpretability57:15 Understanding Model Mechanisms: Circuits & Reasoning01:04:24 Model Planning, Reasoning, and Attribution Graphs01:30:52 Faithfulness, Deception, and Parallel Circuits01:40:16 Publishing Risks, Open Research, and Visualization01:49:33 Barriers, Vision, and Call to Action



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
...more
View all episodesView all episodes
Download on the App Store

Latent Space: The AI Engineer PodcastBy Latent.Space

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

92 ratings


More shows like Latent Space: The AI Engineer Podcast

View all
The a16z Show by Andreessen Horowitz

The a16z Show

1,107 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

308 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

347 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

233 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

204 Listeners

Last Week in AI by Skynet Today

Last Week in AI

311 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

562 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

144 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

227 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

680 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

460 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners