
Sign up to save your podcasts
Or
This is a linkpost for https://unstableontology.com/2023/12/31/a-case-for-ai-alignment-being-difficult/
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
This is an attempt to distill a model of AGI alignment that I have gained primarily from thinkers such as Eliezer Yudkowsky (and to a lesser extent Paul Christiano), but explained in my own terms rather than attempting to hew close to these thinkers. I think I would be pretty good at passing an ideological Turing test for Eliezer Yudowsky on AGI alignment difficulty (but not AGI timelines), though what I'm doing in this post is not that, it's more like finding a branch in the possibility space as I see it that is close enough to Yudowsky's model that it's possible to talk in the same language.
Even if the problem turns out to not be very difficult, it's helpful to have a model of why one might think it is difficult, so as to identify weaknesses in the case so as to find AI designs that avoid the main difficulties. Progress on problems can be made by a combination of finding possible paths and finding impossibility results or difficulty arguments.
Most of what I say should not be taken as a statement on AGI timelines. Some problems that make alignment difficult, such as ontology identification, also make creating capable AGI difficult to some extent.
Source:
https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult
Narrated for LessWrong by Perrin Walker.
Share feedback on this narration.
[Curated Post] ✓
4.8
1111 ratings
This is a linkpost for https://unstableontology.com/2023/12/31/a-case-for-ai-alignment-being-difficult/
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
This is an attempt to distill a model of AGI alignment that I have gained primarily from thinkers such as Eliezer Yudkowsky (and to a lesser extent Paul Christiano), but explained in my own terms rather than attempting to hew close to these thinkers. I think I would be pretty good at passing an ideological Turing test for Eliezer Yudowsky on AGI alignment difficulty (but not AGI timelines), though what I'm doing in this post is not that, it's more like finding a branch in the possibility space as I see it that is close enough to Yudowsky's model that it's possible to talk in the same language.
Even if the problem turns out to not be very difficult, it's helpful to have a model of why one might think it is difficult, so as to identify weaknesses in the case so as to find AI designs that avoid the main difficulties. Progress on problems can be made by a combination of finding possible paths and finding impossibility results or difficulty arguments.
Most of what I say should not be taken as a statement on AGI timelines. Some problems that make alignment difficult, such as ontology identification, also make creating capable AGI difficult to some extent.
Source:
https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult
Narrated for LessWrong by Perrin Walker.
Share feedback on this narration.
[Curated Post] ✓
2,389 Listeners
123 Listeners
4,136 Listeners
87 Listeners
251 Listeners
87 Listeners
389 Listeners
5,432 Listeners
128 Listeners
198 Listeners
121 Listeners
75 Listeners
145 Listeners
121 Listeners
1 Listeners