
Sign up to save your podcasts
Or
---
client: agi_sf
project_id: core_readings
feed_id: agi_sf__alignment
narrator: pw
qa: mds
qa_time: 0h15m
---
One approach to the AI control problem goes like this:
This approach has the major advantage that we can begin empirical work today — we can actually build systems which observe user behavior, try to figure out what the user wants, and then help with that. There are many applications that people care about already, and we can set to work on making rich toy models.
It seems great to develop these capabilities in parallel with other AI progress, and to address whatever difficulties actually arise, as they arise. That is, in each domain where AI can act effectively, we’d like to ensure that AI can also act effectively in the service of goals inferred from users (and that this inference is good enough to support foreseeable applications).
This approach gives us a nice, concrete model of each difficulty we are trying to address. It also provides a relatively clear indicator of whether our ability to control AI lags behind our ability to build it. And by being technically interesting and economically meaningful now, it can help actually integrate AI control with AI practice.
Overall I think that this is a particularly promising angle on the AI safety problem.
Original article:
https://www.alignmentforum.org/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard
Authors:
Paul Christiano
---
This article is featured on the AGI Safety Fundamentals: Alignment course curriculum.
Narrated by TYPE III AUDIO on behalf of BlueDot Impact.
Share feedback on this narration.
---
client: agi_sf
project_id: core_readings
feed_id: agi_sf__alignment
narrator: pw
qa: mds
qa_time: 0h15m
---
One approach to the AI control problem goes like this:
This approach has the major advantage that we can begin empirical work today — we can actually build systems which observe user behavior, try to figure out what the user wants, and then help with that. There are many applications that people care about already, and we can set to work on making rich toy models.
It seems great to develop these capabilities in parallel with other AI progress, and to address whatever difficulties actually arise, as they arise. That is, in each domain where AI can act effectively, we’d like to ensure that AI can also act effectively in the service of goals inferred from users (and that this inference is good enough to support foreseeable applications).
This approach gives us a nice, concrete model of each difficulty we are trying to address. It also provides a relatively clear indicator of whether our ability to control AI lags behind our ability to build it. And by being technically interesting and economically meaningful now, it can help actually integrate AI control with AI practice.
Overall I think that this is a particularly promising angle on the AI safety problem.
Original article:
https://www.alignmentforum.org/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard
Authors:
Paul Christiano
---
This article is featured on the AGI Safety Fundamentals: Alignment course curriculum.
Narrated by TYPE III AUDIO on behalf of BlueDot Impact.
Share feedback on this narration.