AXRP - the AI X-risk Research Podcast

22 - Shard Theory with Quintin Pope


Listen Later

What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading researcher in the shard theory community. We talk about what shard theory is, what it says about humans and neural networks, and what the implications are for making AI safe.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Episode art by Hamish Doodles: hamishdoodles.com

 

Topics we discuss, and timestamps:

 - 0:00:42 - Why understand human value formation?

   - 0:19:59 - Why not design methods to align to arbitrary values?

 - 0:27:22 - Postulates about human brains

   - 0:36:20 - Sufficiency of the postulates

   - 0:44:55 - Reinforcement learning as conditional sampling

   - 0:48:05 - Compatibility with genetically-influenced behaviour

   - 1:03:06 - Why deep learning is basically what the brain does

 - 1:25:17 - Shard theory

   - 1:38:49 - Shard theory vs expected utility optimizers

   - 1:54:45 - What shard theory says about human values

 - 2:05:47 - Does shard theory mean we're doomed?

   - 2:18:54 - Will nice behaviour generalize?

   - 2:33:48 - Does alignment generalize farther than capabilities?

 - 2:42:03 - Are we at the end of machine learning history?

 - 2:53:09 - Shard theory predictions

 - 2:59:47 - The shard theory research community

   - 3:13:45 - Why do shard theorists not work on replicating human childhoods?

 - 3:25:53 - Following shardy research

 

The transcript: axrp.net/episode/2023/06/15/episode-22-shard-theory-quintin-pope.html

 

Shard theorist links:

 - Quintin's LessWrong profile: lesswrong.com/users/quintin-pope

 - Alex Turner's LessWrong profile: lesswrong.com/users/turntrout

 - Shard theory Discord: discord.gg/AqYkK7wqAG

 - EleutherAI Discord: discord.gg/eleutherai

 

Research we discuss:

 - The Shard Theory Sequence: lesswrong.com/s/nyEFg3AuJpdAozmoX

 - Pretraining Language Models with Human Preferences: arxiv.org/abs/2302.08582

 - Inner alignment in salt-starved rats: lesswrong.com/posts/wcNEXDHowiWkRxDNv/inner-alignment-in-salt-starved-rats

 - Intro to Brain-like AGI Safety Sequence: lesswrong.com/s/HzcM2dkCq7fwXBej8

 - Brains and transformers:

   - The neural architecture of language: Integrative modeling converges on predictive processing: pnas.org/doi/10.1073/pnas.2105646118

   - Brains and algorithms partially converge in natural language processing: nature.com/articles/s42003-022-03036-1

   - Evidence of a predictive coding hierarchy in the human brain listening to speech: nature.com/articles/s41562-022-01516-2

 - Singular learning theory explainer: Neural networks generalize because of this one weird trick: lesswrong.com/posts/fovfuFdpuEwQzJu2w/neural-networks-generalize-because-of-this-one-weird-trick

 - Singular learning theory links: metauni.org/slt/

 - Implicit Regularization via Neural Feature Alignment, aka circles in the parameter-function map: arxiv.org/abs/2008.00938

 - The shard theory of human values: lesswrong.com/s/nyEFg3AuJpdAozmoX/p/iCfdcxiyr2Kj8m8mT

 - Predicting inductive biases of pre-trained networks: openreview.net/forum?id=mNtmhaDkAr

 - Understanding and controlling a maze-solving policy network, aka the cheese vector: lesswrong.com/posts/cAC4AXiNC5ig6jQnc/understanding-and-controlling-a-maze-solving-policy-network

 - Quintin's Research agenda: Supervising AIs improving AIs: lesswrong.com/posts/7e5tyFnpzGCdfT4mR/research-agenda-supervising-ais-improving-ais

 - Steering GPT-2-XL by adding an activation vector: lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

 

Links for the addendum on mesa-optimization skepticism:

 - Quintin's response to Yudkowsky arguing against AIs being steerable by gradient descent: lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Yudkowsky_argues_against_AIs_being_steerable_by_gradient_descent_

 - Quintin on why evolution is not like AI training: lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Edit__Why_evolution_is_not_like_AI_training

 - Evolution provides no evidence for the sharp left turn: lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn

 - Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets: arxiv.org/abs/1905.10854

...more
View all episodesView all episodes
Download on the App Store

AXRP - the AI X-risk Research PodcastBy Daniel Filan

  • 4.4
  • 4.4
  • 4.4
  • 4.4
  • 4.4

4.4

8 ratings


More shows like AXRP - the AI X-risk Research Podcast

View all
Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,377 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,430 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,083 Listeners

Future of Life Institute Podcast by Future of Life Institute

Future of Life Institute Podcast

107 Listeners

The Daily by The New York Times

The Daily

112,351 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,799 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

489 Listeners

Hard Fork by The New York Times

Hard Fork

5,468 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

132 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,152 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

97 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

209 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

131 Listeners