Machine Learning Street Talk (MLST)

Neel Nanda - Mechanistic Interpretability


Listen Later

In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers.

Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. However, Neel notes we must start small, build rigorous foundations, and not assume our theoretical frameworks perfectly match reality.

The conversation turns to whether models can have goals or agency, with Neel arguing they likely can based on heuristics like models executing long term plans towards some objective. However, we currently lack techniques to build models with specific goals, meaning any goals would likely be learned or emergent. Neel highlights how induction heads, circuits models use to track long range dependencies, seem crucial for phenomena like in-context learning to emerge.

On the existential risks from AI, Neel believes we should avoid overly confident claims that models will or will not be dangerous, as we do not understand them enough to make confident theoretical assertions. However, models could pose risks through being misused, having undesirable emergent properties, or being imperfectly aligned. Neel argues we must pursue rigorous empirical work to better understand and ensure model safety, avoid "philosophizing" about definitions of intelligence, and focus on ensuring researchers have standards for what it means to decide a system is "safe" before deploying it. Overall, a thoughtful conversation on one of the most important issues of our time.


Support us! https://www.patreon.com/mlst

MLST Discord: https://discord.gg/aNPkGUQtc5

Twitter: https://twitter.com/MLStreetTalk


Neel Nanda: https://www.neelnanda.io/


TOC

[00:00:00] Introduction and Neel Nanda's Interests (walk and talk)

[00:03:15] Mechanistic Interpretability: Reverse Engineering Neural Networks

[00:13:23] Discord questions

[00:21:16] Main interview kick-off in studio

[00:49:26] Grokking and Sudden Generalization

[00:53:18] The Debate on Systematicity and Compositionality

[01:19:16] How do ML models represent their thoughts

[01:25:51] Do Large Language Models Learn World Models?

[01:53:36] Superposition and Interference in Language Models

[02:43:15] Transformers discussion

[02:49:49] Emergence and In-Context Learning

[03:20:02] Superintelligence/XRisk discussion


Transcript: https://docs.google.com/document/d/1FK1OepdJMrqpFK-_1Q3LQN6QLyLBvBwWW_5z8WrS1RI/edit?usp=sharing

Refs: https://docs.google.com/document/d/115dAroX0PzSduKr5F1V4CWggYcqIoSXYBhcxYktCnqY/edit?usp=sharing


...more
View all episodesView all episodes
Download on the App Store

Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

83 ratings


More shows like Machine Learning Street Talk (MLST)

View all
Data Skeptic by Kyle Polich

Data Skeptic

470 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

434 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

296 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

324 Listeners

Practical AI by Practical AI LLC

Practical AI

190 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

200 Listeners

Last Week in AI by Skynet Today

Last Week in AI

282 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

352 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

125 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

190 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

63 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

64 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

422 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Training Data by Sequoia Capital

Training Data

36 Listeners