
Sign up to save your podcasts
Or


Epistemic status: I've been thinking about this topic for about 15 years, which led me to some counterintuitive conclusions, and I'm now writing up my thoughts concisely.
Value Learning
Value Learning offers hope for the Alignment problem in Artificial Intelligence: if we can sufficiently-nearly align our AIs, then they will want to help us, and should converge to full alignment with human values. However, for this to be possible, they will need (at least) a definition of what the phrase 'human values' means. The long-standing proposal for this is Eliezer Yudkowski's Coherent Extrapolated Volition (CEV):
In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function.
This is a somewhat hand-wavy definition. It feels like a limit of some convergence process along these lines might exist. However, the extrapolation process seems rather loosely defined, and without access to superhuman intelligence and sufficient computational resources, it's very [...]
---
Outline:
(00:24) Value Learning
(02:31) Evolutionary Psychology
(05:55) Two Asides
(09:52) Tool, or Equal?
(22:01) So its a Tool
(29:16) Consequences
(33:43) Reflection
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongEpistemic status: I've been thinking about this topic for about 15 years, which led me to some counterintuitive conclusions, and I'm now writing up my thoughts concisely.
Value Learning
Value Learning offers hope for the Alignment problem in Artificial Intelligence: if we can sufficiently-nearly align our AIs, then they will want to help us, and should converge to full alignment with human values. However, for this to be possible, they will need (at least) a definition of what the phrase 'human values' means. The long-standing proposal for this is Eliezer Yudkowski's Coherent Extrapolated Volition (CEV):
In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function.
This is a somewhat hand-wavy definition. It feels like a limit of some convergence process along these lines might exist. However, the extrapolation process seems rather loosely defined, and without access to superhuman intelligence and sufficient computational resources, it's very [...]
---
Outline:
(00:24) Value Learning
(02:31) Evolutionary Psychology
(05:55) Two Asides
(09:52) Tool, or Equal?
(22:01) So its a Tool
(29:16) Consequences
(33:43) Reflection
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,945 Listeners

132 Listeners

7,286 Listeners

540 Listeners

16,338 Listeners

4 Listeners

14 Listeners

2 Listeners