Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Regulation May Be More Important Than AI Alignment For Existential Safety, published by otto.barten on August 24, 2023 on LessWrong.
Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment.
Epistemic status: I've been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree).
Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post.
This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the 'end game' of AI existential risk, not about intermediate states.
AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: "if all you need is an object that doesn't do dangerous things, you could try a sponge". Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one.
This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Ben Hendrycks' paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient.
Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient).
The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferati...