
Sign up to save your podcasts
Or


This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.
My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.
Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.
Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.
The [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThis is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.
My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.
Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.
Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.
The [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,842 Listeners

130 Listeners

7,215 Listeners

531 Listeners

16,221 Listeners

4 Listeners

14 Listeners

2 Listeners