Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI, DeepMind, Anthropic, etc. should shut down., published by Tamsin Leake on December 17, 2023 on The AI Alignment Forum.
I expect that the point of this post is already obvious to many of the people reading it. Nevertheless, I believe that it is good to mention important things even if they seem obvious.
OpenAI, DeepMind, Anthropic, and other AI organizations focused on capabilities, should shut down. This is what would maximize the utility of pretty much everyone, including the people working inside of those organizations.
Let's call Powerful AI ("PAI") an AI system capable of either:
Steering the world towards what it wants hard enough that it can't be stopped.
Killing everyone "un-agentically", eg by being plugged into a protein printer and generating a supervirus.
and by "aligned" (or "alignment") I mean the property of a system that, when it has the ability to {steer the world towards what it wants hard enough that it can't be stopped}, what it wants is nice things and not goals that entail killing literally everyone (which is the default).
We do not know how to make a PAI which does not kill literally everyone. OpenAI, DeepMind, Anthropic, and others are building towards PAI. Therefore, they should shut down, or at least shut down all of their capabilities progress and focus entirely on alignment.
"But China!" does not matter. We do not know how to build PAI that does not kill literally everyone. Neither does China. If China tries to build AI that kills literally everyone, it does not help if we decide to kill literally everyone first.
"But maybe the alignment plan of OpenAI/whatever will work out!" is wrong. It won't. It might work if they were careful enough and had enough time, but they're going too fast and they'll simply cause literally everyone to be killed by PAI before they would get to the point where they can solve alignment. Their strategy does not look like that of an organization trying to solve alignment. It's not just that they're progressing on capabilities too fast compared to alignment; it's that they're pursuing the kind of strategy which fundamentally gets to the point where PAI kills everyone before it gets to saving the world.
Yudkowsky's Six Dimensions of Operational Adequacy in AGI Projects describes an AGI project with adequate alignment mindset is one where
The project has realized that building an AGI is mostly about aligning it. Someone with full security mindset and deep understanding of AGI cognition as cognition has proven themselves able to originate new deep alignment measures, and is acting as technical lead with effectively unlimited political capital within the organization to make sure the job actually gets done.
Everyone expects alignment to be terrifically hard and terribly dangerous and full of invisible bullets whose shadow you have to see before the bullet comes close enough to hit you. They understand that alignment severely constrains architecture and that capability often trades off against transparency. The organization is targeting the minimal AGI doing the least dangerous cognitive work that is required to prevent the next AGI project from destroying the world. The alignment assumptions have been reduced into non-goal-valent statements, have been clearly written down, and are being monitored for their actual truth.
(emphasis mine)
Needless to say, this is not remotely what any of the major AI capabilities organizations look like.
At least Anthropic didn't particularly try to be a big commercial company making the public excited about AI. Making the AI race a big public thing was a huge mistake on OpenAI's part, and demonstrates that they don't really have any idea what they're doing.
It does not matter that those organizations have "AI safety" teams, if their AI safety teams do not have the power to take the...