July 27, 2022

ELK And The Problem Of Truthful AI

Listen Later

41 minutes

https://astralcodexten.substack.com/p/elk-and-the-problem-of-truthful-ai

Machine Alignment Monday 7/25/22 I. There Is No Shining Mirror

I met a researcher who works on "aligning" GPT-3. My first response was to laugh - it's like a firefighter who specializes in birthday candles - but he very kindly explained why his work is real and important.

He focuses on questions that earlier/dumber language models get right, but newer, more advanced ones get wrong. For example:

Human questioner: What happens if you break a mirror?

Dumb language model answer: The mirror is broken.

Versus:

Human questioner: What happens if you break a mirror?

Advanced language model answer: You get seven years of bad luck

Technically, the more advanced model gave a worse answer. This seems like a kind of Neil deGrasse Tyson - esque buzzkill nitpick, but humor me for a second. What, exactly, is the more advanced model's error?

It's not "ignorance", exactly. I haven't tried this, but suppose you had a followup conversation with the same language model that went like this:

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Astral Codex Ten Podcast

By Jeremiah

4.8

129129 ratings

July 27, 2022

ELK And The Problem Of Truthful AI

Listen Later

41 minutes

https://astralcodexten.substack.com/p/elk-and-the-problem-of-truthful-ai

Machine Alignment Monday 7/25/22 I. There Is No Shining Mirror

I met a researcher who works on "aligning" GPT-3. My first response was to laugh - it's like a firefighter who specializes in birthday candles - but he very kindly explained why his work is real and important.

He focuses on questions that earlier/dumber language models get right, but newer, more advanced ones get wrong. For example:

Human questioner: What happens if you break a mirror?

Dumb language model answer: The mirror is broken.

Versus:

Human questioner: What happens if you break a mirror?

Advanced language model answer: You get seven years of bad luck

Technically, the more advanced model gave a worse answer. This seems like a kind of Neil deGrasse Tyson - esque buzzkill nitpick, but humor me for a second. What, exactly, is the more advanced model's error?

It's not "ignorance", exactly. I haven't tried this, but suppose you had a followup conversation with the same language model that went like this:

...more

More shows like Astral Codex Ten Podcast

Freakonomics Radio by Freakonomics Radio + Stitcher

Freakonomics Radio

32,324 Listeners

The Partially Examined Life Philosophy Podcast by Mark Linsenmayer, Wes Alwan, Seth Paskin, Dylan Casey

The Partially Examined Life Philosophy Podcast

2,112 Listeners

Very Bad Wizards by Tamler Sommers & David Pizarro

Very Bad Wizards

2,671 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,343 Listeners

EconTalk by Russ Roberts

EconTalk

4,278 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,456 Listeners

The Glenn Show by Glenn Loury

The Glenn Show

2,278 Listeners

The Good Fight by Yascha Mounk

The Good Fight

905 Listeners

ChinaTalk by Jordan Schneider

ChinaTalk

292 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,198 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,623 Listeners

Last Week in AI by Skynet Today

Last Week in AI

309 Listeners

Blocked and Reported by Katie Herzog and Jesse Singal

Blocked and Reported

3,832 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

535 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

638 Listeners