Training Data

Mapping the Mind of a Neural Net: Goodfire’s Eric Ho on the Future of Interpretability


Listen Later

Eric Ho is building Goodfire to solve one of AI’s most critical challenges: understanding what’s actually happening inside neural networks. His team is developing techniques to understand, audit and edit neural networks at the feature level. Eric discusses breakthrough results in resolving superposition through sparse autoencoders, successful model editing demonstrations and real-world applications in genomics with Arc Institute's DNA foundation models. He argues that interpretability will be critical as AI systems become more powerful and take on mission-critical roles in society.

Hosted by Sonya Huang and Roelof Botha, Sequoia Capital

Mentioned in this episode:

  • Mech interp: Mechanistic interpretability, list of important papers here

  • Phineas Gage: 19th century railway engineer who lost most of his brain’s left frontal lobe in an accident. Became a famous case study in neuroscience.

  • Human Genome Project: Effort from 1990-2003 to generate the first sequence of the human genome which accelerated the study of human biology

  • Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

  • Zoom In: An Introduction to Circuits: First important mechanistic interpretability paper from OpenAI in 2020

  • Superposition: Concept from physics applied to interpretability that allows neural networks to simulate larger networks (e.g. more concepts than neurons)

  • Apollo Research: AI safety company that designs AI model evaluations and conducts interpretability research

  • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. 2023 Anthropic paper that uses a sparse autoencoder to extract interpretable features; followed by Scaling Monosemanticity

  • Under the Hood of a Reasoning Model: 2025 Goodfire paper that interprets DeepSeek’s reasoning model R1

  • Auto-interpretability: The ability to use LLMs to automatically write explanations for the behavior of neurons in LLMs

  • Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model. (see episode with Arc co-founder Patrick Hsu)

  • Paint with Ember: Canvas interface from Goodfire that lets you steer an LLM’s visual output  in real time (paper here)

  • Model diffing: Interpreting how a model differs from checkpoint to checkpoint during finetuning

  • Feature steering: The ability to change the style of LLM output by up or down weighting features (e.g. talking like a pirate vs factual information about the Andromeda Galaxy)

  • Weight based interpretability: Method for directly decomposing neural network parameters into mechanistic components, instead of using features

  • The Urgency of Interpretability: Essay by Anthropic founder Dario Amodei

    On the Biology of a Large Language Model: Goodfire collaboration with Anthropic

    ...more
    View all episodesView all episodes
    Download on the App Store

    Training DataBy Sequoia Capital

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    36 ratings


    More shows like Training Data

    View all
    This Week in Startups by Jason Calacanis

    This Week in Startups

    1,273 Listeners

    a16z Podcast by Andreessen Horowitz

    a16z Podcast

    1,033 Listeners

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

    519 Listeners

    Invest Like the Best with Patrick O'Shaughnessy by Colossus | Investing & Business Podcasts

    Invest Like the Best with Patrick O'Shaughnessy

    2,316 Listeners

    Y Combinator Startup Podcast by Y Combinator

    Y Combinator Startup Podcast

    217 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    88 Listeners

    Dwarkesh Podcast by Dwarkesh Patel

    Dwarkesh Podcast

    408 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    121 Listeners

    Unsupervised Learning by by Redpoint Ventures

    Unsupervised Learning

    39 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    75 Listeners

    Crucible Moments by Sequoia Capital

    Crucible Moments

    92 Listeners

    The Ben & Marc Show by Marc Andreessen, Ben Horowitz

    The Ben & Marc Show

    135 Listeners

    BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

    BG2Pod with Brad Gerstner and Bill Gurley

    461 Listeners

    AI + a16z by a16z

    AI + a16z

    31 Listeners

    Lightcone Podcast by Y Combinator

    Lightcone Podcast

    22 Listeners

    Uncapped with Jack Altman by Alt Capital

    Uncapped with Jack Altman

    17 Listeners