Get the Check

Inside the Viral Subliminal Learning AI Paper with author Minh Le


Listen Later

This week the pod sat down with Minh Le, one of the researchers behind the viral AI safety research study “Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.”

The paper showed that if you have a teacher model with a love for owls that teaches a student model a series of random numbers, the student will also inherit a love for owls as long as they share the same base model, which means models can inherit misaligned traits from other models even if it’s not observable in training data.

The hosts deep dive into the paper’s methodology and ask about Minh’s strategy when filtering out numbers that might carry unintended associations like 666 or 911 that have an association with evil or danger. Fun fact the original plan was to use a love for eagles but they switched it to owls because there were fewer associations that could create potential noise. They also go over theories about why the teacher’s behavior is transmitted when the data transferred is random and filtered. Spoiler, it probably wasn’t a secret code in the numbers but rather the data distribution triggering emergent behaviors in the student model like a love for owls.

The pod also gets into what the media got wrong about the paper, AI safety, and Minh’s hot take on why he doesn’t buy into p doom (the idea that AI leads to human extinction…). Minh also talks about how he went from being an independent researcher to the prestigious Anthropic Fellowship and now a full time role at Anthropic.

00:00 Minh's career journey

04:36 Deep dive into the subliminal learning study

26:23 Larger discussion about AI safety

...more
View all episodesView all episodes
Download on the App Store

Get the CheckBy Anika, Maya, Priya

  • 5
  • 5
  • 5
  • 5
  • 5

5

19 ratings


More shows like Get the Check

View all
Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,079 Listeners

Pivot by New York Magazine

Pivot

9,503 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,083 Listeners

Founders by David Senra

Founders

2,084 Listeners

The Daily by The New York Times

The Daily

112,454 Listeners

Group Chat by Chris "Drama" Pfaff, Dee Murthy & Anand Murthy

Group Chat

1,440 Listeners

The Best One Yet by Nick & Jack Studios

The Best One Yet

9,610 Listeners

Call Her Daddy by Alex Cooper

Call Her Daddy

165,706 Listeners

The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,053 Listeners

Fiction - Comedy Fiction by The Sunset Explorers

Fiction - Comedy Fiction

6,448 Listeners

Morning Brew Daily by Morning Brew

Morning Brew Daily

2,998 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,829 Listeners

Hard Fork by The New York Times

Hard Fork

5,479 Listeners

The Headlines by The New York Times

The Headlines

586 Listeners

Aspire with Emma Grede by Emma Grede | Audacy

Aspire with Emma Grede

586 Listeners