
Sign up to save your podcasts
Or


https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.(As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.)
In my last post, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties.
Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected.
I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches.
Before getting into that, let me be very explicit about three points:
By LessWrong4.8
1212 ratings
https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.(As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.)
In my last post, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties.
Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected.
I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches.
Before getting into that, let me be very explicit about three points:

3,065 Listeners

1,942 Listeners

4,275 Listeners

2,455 Listeners

1,548 Listeners

289 Listeners

97 Listeners

97 Listeners

522 Listeners

141 Listeners

209 Listeners

151 Listeners

394 Listeners

134 Listeners

93 Listeners