Astral Codex Ten Podcast

God Help Us, Let's Try To Understand AI Monosemanticity


Listen Later

Inside every AI is a bigger AI, trying to get out

You've probably heard AI is a "black box". No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, "reward" it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it.

This is bad for safety. For safety, it would be nice to look inside the AI and see whether it's executing an algorithm like "do the thing" or more like "trick the humans into thinking I'm doing the thing". But we can't. Because we can't look inside an AI at all.

Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this:

https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand

...more
View all episodesView all episodes
Download on the App Store

Astral Codex Ten PodcastBy Jeremiah

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

129 ratings


More shows like Astral Codex Ten Podcast

View all
Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,328 Listeners

The Partially Examined Life Philosophy Podcast by Mark Linsenmayer, Wes Alwan, Seth Paskin, Dylan Casey

The Partially Examined Life Philosophy Podcast

2,112 Listeners

Very Bad Wizards by Tamler Sommers & David Pizarro

Very Bad Wizards

2,672 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,345 Listeners

EconTalk by Russ Roberts

EconTalk

4,278 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,458 Listeners

The Glenn Show by Glenn Loury

The Glenn Show

2,279 Listeners

The Good Fight by Yascha Mounk

The Good Fight

906 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

292 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,195 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,625 Listeners

Last Week in AI by Skynet Today

Last Week in AI

309 Listeners

Blocked and Reported by Katie Herzog and Jesse Singal

Blocked and Reported

3,831 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

531 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

639 Listeners