Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: “Endgame safety” for AGI, published by Steve Byrnes on January 24, 2023 on The AI Alignment Forum.
(Status: no pretense to originality, but a couple people said they found this terminology useful, so I’m sharing it more widely.)
There’s a category of AGI safety work that we might call “Endgame Safety”, where we’re trying to do all the AGI safety work that we couldn’t or didn’t do ahead of time, in the very last moments before (or even after) people are actually playing around with powerful AGI algorithms of the type that could get irreversibly out of control and cause catastrophe.
I think everyone agrees that Endgame Safety is important and unavoidable. If nothing else, for every last line of AGI source code, we can do an analysis of what happens if that line of code has a bug, or if a cosmic ray flips a bit, and how do we write good unit tests, etc. But we’re obviously not going to have AGI source code until the endgame. That was an especially straightforward example, but I imagine that there will be many other things that also fall into the Endgame Safety bucket, i.e. bigger-picture important things to know about AGI that we only realize when we’re in the thick of building it.
So I am not an “Endgame Safety denialist”; I don’t think anyone is. But I find that people are sometimes misled by thinking about Endgame Safety, in the following two ways:
Bad argument 1: “Endgame Safety is really important. So let’s try to make the endgame happen ASAP, so that we can get to work on Endgame Safety!”
(example of this argument)
This is a bad argument because, what’s the rush? There’s going to be an endgame sooner or later, and we can do Endgame Safety Research then! Bringing the endgame sooner is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed.
I claim that there are plenty of open problems in AGI safety / alignment that we can do right now, that people are in fact working on right now, that seem robustly useful, and that are not in the category of “Endgame Safety”, e.g. my list of 7 projects, these 200 interpretability projects, this list, ELK, everything on Alignment Forum, etc.
For example, sometimes I’ll have this discussion:
ME: “I don’t want to talk about (blah) aspect of how I think future AGI will be built, because all my opinions are either wrong or infohazards—the latter because (if correct) they might substantially speed the arrival of AGI, which gives us less time for safety / alignment research.”
THEM: “WTF dude, I’m an AGI safety / alignment researcher like you! That’s why I’m standing here asking you these questions! And I assure you: if you answer my questions, it will help me do good AGI safety research.”
So there’s my answer. I claim that this person is trying to do Endgame Safety right now, and I don’t want to help them. I think they should find something else to do right now instead, while they wait for some AI researcher to publish an answer to their prerequisite capabilities question. That’s bound to happen sooner or later! Or they can do contingency-planning for each of the possible answers to their capabilities question. Whatever.
Bad argument 2: “Endgame Safety researchers will obviously be in a much better position to do safety / alignment research than we are today, because they’ll know more about how AGI works, and probably have proto-AGI test results, etc. So other things equal, we should move resources from current less-productive safety research to future more-productive Endgame Safety research.”
The biggest problem here is that, while Endgame Safety...