Get the Check

Inside the Viral Subliminal Learning AI Paper with author Minh Le


Listen Later

This week the pod sat down with Minh Le, one of the researchers behind the viral AI safety research study “Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.”

The paper showed that if you have a teacher model with a love for owls that teaches a student model a series of random numbers, the student will also inherit a love for owls as long as they share the same base model, which means models can inherit misaligned traits from other models even if it’s not observable in training data.

The hosts deep dive into the paper’s methodology and ask about Minh’s strategy when filtering out numbers that might carry unintended associations like 666 or 911 that have an association with evil or danger. Fun fact the original plan was to use a love for eagles but they switched it to owls because there were fewer associations that could create potential noise. They also go over theories about why the teacher’s behavior is transmitted when the data transferred is random and filtered. Spoiler, it probably wasn’t a secret code in the numbers but rather the data distribution triggering emergent behaviors in the student model like a love for owls.

The pod also gets into what the media got wrong about the paper, AI safety, and Minh’s hot take on why he doesn’t buy into p doom (the idea that AI leads to human extinction…). Minh also talks about how he went from being an independent researcher to the prestigious Anthropic Fellowship and now a full time role at Anthropic.

00:00 Minh's career journey

04:36 Deep dive into the subliminal learning study

26:23 Larger discussion about AI safety

...more
View all episodesView all episodes
Download on the App Store

Get the CheckBy Anika, Maya, Priya

  • 5
  • 5
  • 5
  • 5
  • 5

5

20 ratings


More shows like Get the Check

View all
The Journal. by The Wall Street Journal & Spotify Studios

The Journal.

6,119 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,241 Listeners

Hard Fork by The New York Times

Hard Fork

5,547 Listeners

Good Noticings by Vox Media Podcast Network

Good Noticings

6,485 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,317 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

155 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

461 Listeners