
Sign up to save your podcasts
Or
This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.
My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.
Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.
Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.
The [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models.
My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control.
Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks.
Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas.
The [...]
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,420 Listeners
2,387 Listeners
7,893 Listeners
4,126 Listeners
87 Listeners
1,458 Listeners
9,040 Listeners
87 Listeners
390 Listeners
5,431 Listeners
15,216 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners