
Sign up to save your podcasts
Or


Recently, I spent a couple of hours talking with a friend about the state of the evidence for AI takeover scenarios.
Their trailhead question was (paraphrased):
Current AIs are getting increasingly general, but they’re not self-promoting or ambitious. They answer questions, but they don’t seem to pursue convergent instrumental goals, for their own ends. How and why do AIs go from being the kind of thing that doesn't behave like that to the kind of thing that does?
The following is a writeup of my attempt at answering that question.
In brief:
The classic AI danger scenario involves at least one AI that pursues instrumentally convergent resources in service of a misaligned goal. For this story, the AI must have the capability to pursue instrumentally convergent resources and the inclination to do so for misaligned goals against the interests of humans.
With regards to capability: The current generation of AIs are mostly not effective enough to make pursuing instrumentally convergent resources a good strategy. But as the AIs get more capable, we can expect them to do that more and more.
With regards to inclination: Current AIs sometimes pursue their own objectives even when they understand that is not [...]
---
Outline:
(01:46) Current AIs do pursue instrumental goals
(02:42) Current AIs do not pursue convergent instrumental goals qua convergent instrumental goals...
(05:32) ...but, we can expect them to improve at that
(07:05) Current AIs pursue goals that they know that their human users don't want, in some contexts
(12:26) Summing up
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongRecently, I spent a couple of hours talking with a friend about the state of the evidence for AI takeover scenarios.
Their trailhead question was (paraphrased):
Current AIs are getting increasingly general, but they’re not self-promoting or ambitious. They answer questions, but they don’t seem to pursue convergent instrumental goals, for their own ends. How and why do AIs go from being the kind of thing that doesn't behave like that to the kind of thing that does?
The following is a writeup of my attempt at answering that question.
In brief:
The classic AI danger scenario involves at least one AI that pursues instrumentally convergent resources in service of a misaligned goal. For this story, the AI must have the capability to pursue instrumentally convergent resources and the inclination to do so for misaligned goals against the interests of humans.
With regards to capability: The current generation of AIs are mostly not effective enough to make pursuing instrumentally convergent resources a good strategy. But as the AIs get more capable, we can expect them to do that more and more.
With regards to inclination: Current AIs sometimes pursue their own objectives even when they understand that is not [...]
---
Outline:
(01:46) Current AIs do pursue instrumental goals
(02:42) Current AIs do not pursue convergent instrumental goals qua convergent instrumental goals...
(05:32) ...but, we can expect them to improve at that
(07:05) Current AIs pursue goals that they know that their human users don't want, in some contexts
(12:26) Summing up
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

113,081 Listeners

132 Listeners

7,271 Listeners

530 Listeners

16,299 Listeners

4 Listeners

14 Listeners

2 Listeners