80,000 Hours Podcast

#107 – Chris Olah on what the hell is going on inside neural networks


Listen Later

Big machine learning models can identify plant species better than any human, write passable essays, beat you at a game of Starcraft 2, figure out how a photo of Tobey Maguire and the word 'spider' are related, solve the 60-year-old 'protein folding problem', diagnose some diseases, play romantic matchmaker, write solid computer code, and offer questionable legal advice.

Humanity made these amazing and ever-improving tools. So how do our creations work? In short: we don't know.

Today's guest, Chris Olah, finds this both absurd and unacceptable. Over the last ten years he has been a leader in the effort to unravel what's really going on inside these black boxes. As part of that effort he helped create the famous DeepDream visualisations at Google Brain, reverse engineered the CLIP image classifier at OpenAI, and is now continuing his work at Anthropic, a new $100 million research company that tries to "co-develop the latest safety techniques alongside scaling of large ML models".

Links to learn more, summary and full transcript.

Despite having a huge fan base thanks to his explanations of ML and tweets, today's episode is the first long interview Chris has ever given. It features his personal take on what we've learned so far about what ML algorithms are doing, and what's next for this research agenda at Anthropic.

His decade of work has borne substantial fruit, producing an approach for looking inside the mess of connections in a neural network and back out what functional role each piece is serving. Among other things, Chris and team found that every visual classifier seems to converge on a number of simple common elements in their early layers — elements so fundamental they may exist in our own visual cortex in some form.

They also found networks developing 'multimodal neurons' that would trigger in response to the presence of high-level concepts like 'romance', across both images and text, mimicking the famous 'Halle Berry neuron' from human neuroscience.

While reverse engineering how a mind works would make any top-ten list of the most valuable knowledge to pursue for its own sake, Chris's work is also of urgent practical importance. Machine learning models are already being deployed in medicine, business, the military, and the justice system, in ever more powerful roles. The competitive pressure to put them into action as soon as they can turn a profit is great, and only getting greater.

But if we don't know what these machines are doing, we can't be confident they'll continue to work the way we want as circumstances change. Before we hand an algorithm the proverbial nuclear codes, we should demand more assurance than "well, it's always worked fine so far".

But by peering inside neural networks and figuring out how to 'read their minds' we can potentially foresee future failures and prevent them before they happen. Artificial neural networks may even be a better way to study how our own minds work, given that, unlike a human brain, we can see everything that's happening inside them — and having been posed similar challenges, there's every reason to think evolution and 'gradient descent' often converge on similar solutions.

Among other things, Rob and Chris cover:

• Why Chris thinks it's necessary to work with the largest models
• What fundamental lessons we've learned about how neural networks (and perhaps humans) think
• How interpretability research might help make AI safer to deploy, and Chris’ response to skeptics
• Why there's such a fuss about 'scaling laws' and what they say about future AI progress

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

Producer: Keiran Harris
Audio mastering: Ben Cordell
Transcriptions: Sofia Davis-Fogel

...more
View all episodesView all episodes
Download on the App Store

80,000 Hours PodcastBy The 80,000 Hours team

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

304 ratings


More shows like 80,000 Hours Podcast

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,380 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,461 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Azeem Azhar's Exponential View by Azeem Azhar

Azeem Azhar's Exponential View

616 Listeners

The Joe Walker Podcast by Joe Walker

The Joe Walker Podcast

124 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

291 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Aza Raskin

Your Undivided Attention

1,635 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

203 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

Hard Fork by The New York Times

Hard Fork

5,576 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

147 Listeners

The Marginal Revolution Podcast by Mercatus Center at George Mason University

The Marginal Revolution Podcast

91 Listeners