July 31, 2024

“Twitter thread on AI takeover scenarios” by Richard_Ngo

Listen Later

5 minutes

This is a link post.

This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.

My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.

Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.

Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.

The [...]

---

First published:

July 31st, 2024

Source:

https://www.lesswrong.com/posts/tPfqnropv3WfchhYB/twitter-thread-on-ai-takeover-scenarios

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

July 31, 2024

“Twitter thread on AI takeover scenarios” by Richard_Ngo

Listen Later

5 minutes

This is a link post.

This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.

My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.

Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.

Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.

The [...]

---

First published:

July 31st, 2024

Source:

https://www.lesswrong.com/posts/tPfqnropv3WfchhYB/twitter-thread-on-ai-takeover-scenarios

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,842 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,215 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

531 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,221 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners