December 01, 2023

God Help Us, Let's Try To Understand AI Monosemanticity

Listen Later

23 minutes

Inside every AI is a bigger AI, trying to get out

You've probably heard AI is a "black box". No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, "reward" it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it.

This is bad for safety. For safety, it would be nice to look inside the AI and see whether it's executing an algorithm like "do the thing" or more like "trick the humans into thinking I'm doing the thing". But we can't. Because we can't look inside an AI at all.

Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this:

https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Astral Codex Ten Podcast

By Jeremiah

4.8

129129 ratings

December 01, 2023

God Help Us, Let's Try To Understand AI Monosemanticity

Listen Later

23 minutes

Inside every AI is a bigger AI, trying to get out

You've probably heard AI is a "black box". No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, "reward" it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it.

This is bad for safety. For safety, it would be nice to look inside the AI and see whether it's executing an algorithm like "do the thing" or more like "trick the humans into thinking I'm doing the thing". But we can't. Because we can't look inside an AI at all.

Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this:

https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand

...more

More shows like Astral Codex Ten Podcast

Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,100 Listeners

The Partially Examined Life Philosophy Podcast by Mark Linsenmayer, Wes Alwan, Seth Paskin, Dylan Casey

The Partially Examined Life Philosophy Podcast

2,117 Listeners

Very Bad Wizards by Tamler Sommers & David Pizarro

Very Bad Wizards

2,675 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,250 Listeners

EconTalk by Russ Roberts

EconTalk

4,272 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,447 Listeners

The Glenn Show by Glenn Loury

The Glenn Show

2,271 Listeners

The Good Fight by Yascha Mounk

The Good Fight

893 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

292 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,170 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Aza Raskin

Your Undivided Attention

1,627 Listeners

Last Week in AI by Skynet Today

Last Week in AI

314 Listeners

Blocked and Reported by Katie Herzog and Jesse Singal

Blocked and Reported

3,820 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

576 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

682 Listeners