Get the Check

Inside the Viral Subliminal Learning AI Paper with author Minh Le


Listen Later

This week the pod sat down with Minh Le, one of the researchers behind the viral AI safety research study “Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.”

The paper showed that if you have a teacher model with a love for owls that teaches a student model a series of random numbers, the student will also inherit a love for owls as long as they share the same base model, which means models can inherit misaligned traits from other models even if it’s not observable in training data.

The hosts deep dive into the paper’s methodology and ask about Minh’s strategy when filtering out numbers that might carry unintended associations like 666 or 911 that have an association with evil or danger. Fun fact the original plan was to use a love for eagles but they switched it to owls because there were fewer associations that could create potential noise. They also go over theories about why the teacher’s behavior is transmitted when the data transferred is random and filtered. Spoiler, it probably wasn’t a secret code in the numbers but rather the data distribution triggering emergent behaviors in the student model like a love for owls.

The pod also gets into what the media got wrong about the paper, AI safety, and Minh’s hot take on why he doesn’t buy into p doom (the idea that AI leads to human extinction…). Minh also talks about how he went from being an independent researcher to the prestigious Anthropic Fellowship and now a full time role at Anthropic.

00:00 Minh's career journey

04:36 Deep dive into the subliminal learning study

26:23 Larger discussion about AI safety

...more
View all episodesView all episodes
Download on the App Store

Get the CheckBy Anika, Maya, Priya

  • 5
  • 5
  • 5
  • 5
  • 5

5

20 ratings


More shows like Get the Check

View all
This American Life by This American Life

This American Life

90,967 Listeners

Planet Money by NPR

Planet Money

30,693 Listeners

Acquired by Ben Gilbert and David Rosenthal

Acquired

4,631 Listeners

Decoder with Nilay Patel by The Verge

Decoder with Nilay Patel

3,154 Listeners

The Daily by The New York Times

The Daily

112,408 Listeners

The Best One Yet by Nick & Jack Studios

The Best One Yet

9,627 Listeners

Post Reports by The Washington Post

Post Reports

5,452 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,095 Listeners

Morning Brew Daily by Morning Brew

Morning Brew Daily

3,010 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,927 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

511 Listeners

Hard Fork by The New York Times

Hard Fork

5,512 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,931 Listeners

On with Kara Swisher by Vox Media

On with Kara Swisher

3,539 Listeners

The Headlines by The New York Times

The Headlines

621 Listeners