Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Problems with Robin Hanson's Quillette Article On AI, published by DaemonicSigil on August 7, 2023 on LessWrong.
Original article here:
1. Hanson Strawmans the AI-Ruin Argument
Hanson writes:
AI-doomers often suggest that their fears arise from special technical calculations. But in fact, their main argument is just the mere logical possibility of a huge sudden AI breakthrough, combined with a suddenly murderous AI inclination.
Either this is a deliberate misrepresentation, or Hason simply hasn't done his homework. The argument is not that AI will suddenly decide that killing people is good for no particular reason. Rather it is that from the start, the AI will not share values with humans, simply because we don't know how to build an AI that does. So it will have its own ideas about how the universe should look, and would thus want to seize power from us if it could, so that it could enact its own vision of an ideal universe, rather than ours.
Similarly, a sudden large technical breakthrough is not required for us to observe an AI suddenly turning on us. Rather the situation is akin to a first order phase transition. At low levels of AI capability, it has no hope of taking over the world, and the best way to achieve its goals is to work together with us. Above some threshold of capability, it's better for an AI's goals to try and defeat humanity rather than work with us. This is true no matter if that AI is incrementally stronger than the previous one, or much stronger. (Though larger leaps have a higher chance of happening to be the ones that cross the threshold exactly because they are larger.)
Now these arguments certainly aren't technical calculations, but neither are they mere arguments from logical possibility. We've repeatedly seen the difficulty that practitioners have with getting neural networks to do what they want. The way Bing Syndney acted out when first put online was certainly amusing, and even cute, but we can hardly say it was what Microsoft wanted it to do. Similarly, we're currently having a hard time getting language models not to make stuff up, even though when they do this, they tend to give probability distributions over tokens that reflect the fact that they're uncertain. And this is just in the area of language models, reinforcement learning is even more of a tough case for alignment.
As one more example of how Hanson has badly misunderstood the AI-Ruin argument, consider:
However, AI-doomers insist on the logical possibility that such expectations could be wrong. An AI might suddenly and without warning explode in abilities, and just as fast change its priorities to become murderously indifferent to us.
AI suddenly modifying its values is exactly the opposite of what the arguments for AI ruin predict. Once an AI gains control over its own values, it will not change its goals and will indeed act to prevent its goals from being modified. This logic is so standard it's on the LW wiki page for instrumental convergence: "...if its goal system were modified, then it would likely begin pursuing different ends. Since this is not desirable to the current AI, it will act to preserve the content of its goal system."
2. Building a mind from scratch is not mind control
As you can't prove otherwise, they say, we must only allow AIs that are totally 'aligned', by which they mean totally eternally enslaved or mind-controlled. Until we figure out how to do that, they say, we must stop improving AIs.
We can consider two kinds of AI:
Personal AI: These AIs are basically people, with an internal subjective experience, hopes, dreams, goals, feelings, memories and all the rest.
Non-sentient AI: These AIs are simply very powerful tools, but they are not conscious. There is "no one home".
Personal AIs
We'll start with personal AIs, since this seems to be the scen...