March 24, 2023

LW - continue working on hard alignment! don't give up! by carado

1 minute

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: continue working on hard alignment! don't give up!, published by carado on March 24, 2023 on LessWrong.

let's call "hard alignment" the ("orthodox") problem, historically worked on by MIRI, of preventing strong agentic AIs from pursuing things we don't care about by default and destroying everything of value to us on the way there. let's call "easy" alignment the set of perspectives where some of this model is wrong — some of the assumptions are relaxed — such that saving the world is easier or more likely to be the default.

what should one be working on? as always, the calculation consists of comparing

p(hard) × how much value we can get in hard

p(easy) × how much value we can get in easy

given how AI capabilities are going, it's not unreasonable for people to start playing their outs — that is to say, to start acting as if alignment is easy, because if it's not we're doomed anyways. but i think, in this particular case, this is wrong.

this is the lesson of dying with dignity and bracing for the alignment tunnel: we should be cooperating with our counterfactual selves and continue to save the world in whatever way actually seems promising, rather than taking refuge in falsehood.

to me, p(hard) is big enough, and my hard-compatible plan seems workable enough, that it makes sense for me to continue to work on it.

let's not give up on the assumptions which are true. there is still work that can be done to actually generate some dignity under the assumptions that are actually true.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

March 24, 2023

LW - continue working on hard alignment! don't give up! by carado

1 minute

what should one be working on? as always, the calculation consists of comparing

p(hard) × how much value we can get in hard

p(easy) × how much value we can get in easy

to me, p(hard) is big enough, and my hard-compatible plan seems workable enough, that it makes sense for me to continue to work on it.

let's not give up on the assumptions which are true. there is still work that can be done to actually generate some dignity under the assumptions that are actually true.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

Share LW - continue working on hard alignment! don't give up! by carado

Sign up to save your podcasts

LW - continue working on hard alignment! don't give up! by carado

LW - continue working on hard alignment! don't give up! by carado