Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: To Predict What Happens, Ask What Happens, published by Zvi on May 31, 2023 on LessWrong.
When predicting conditional probability of catastrophe from loss of human control over AGI, there are many distinct cruxes. This essay does not attempt a complete case, or the most generally convincing case, or addressing the most common cruxes.
Instead these are my best guesses for potentially mind-changing, armor-piercing questions people could ask themselves if they broadly accept many concepts like power seeking being a key existential risk, that default development paths are likely catastrophic and that AI could defeat all of us combined, have read and thought hard about alignment difficulties, yet think the odds of catastrophe are not so high.
In addition to this entry, I attempt an incomplete extended list of cruxes here, an attempted taxonomy of paths through developing AGI and potentially losing control here, and an attempted taxonomy of styles of alignment here, while leaving to the future or others for now a taxonomy of alignment difficulties.
Apologies in advance if some questions seem insulting or you rightfully answer with ‘no, I am not making that mistake.’ I don’t know a way around that.
Here are the questions up front:
What happens?
To what extent will humanity seek to avoid catastrophe?
How much will humans willingly give up, including control?
You know people and companies and nations are dumb and make dumb mistakes constantly, and mostly take symbolic actions or gesture at things rather than act strategically, and you’ve taken that into account, right?
What would count as a catastrophe?
Are you consistently tracking what you mean by alignment?
Would ‘human-strength’ alignment be sufficient?
If we figure out how to robustly align our AGIs, will we choose to and be able to make and keep them that way? Would we keep control?
How much hope is there that a misaligned AGI would choose to preserve humanity once it no longer needed us?
Are you factoring in unknown difficulties and surprises large and small that always arise, and in which direction do they point? re you treating doom as only happening through specified detailed logical paths, which if they break down mean it’s going to be fine?
Are you properly propagating your updates, and anticipating future updates?
Are you counting on in-distribution heuristics to work out of distribution?
Are you using instincts and heuristics rather than looking at mechanics, forming a model, doing math, using Bayes Rule?
Is normalcy bias, hopeful thinking, avoidance of implications or social cognition subtly influencing your analysis? Are you unconsciously modeling after media?
What happens?
If you think first about ‘will there be doom?’ or ‘will there be a catastrophe?’ you are directly invoking hopeful or fearful thinking, shifting focus towards wrong questions, and concocting arbitrary ways for scenarios to end well or badly. Including expecting humans to, when in danger, act in ways humans don’t act.
Instead, ask: What happens?
What affordances would this AGI have? What happens to the culture? What posts get written on Marginal Revolution? What other questions jump to mind?
Then ask whether what happens was catastrophic.
To what extent will humanity seek to avoid catastrophe?
I often observe an importantly incorrect model here.
Take the part of your model that explains why:
We aren’t doing better global coordination to slow AI capabilities.
We frequently fail or muddle through when facing dangers.
So many have willingly sided with what seem like obviously evil causes, often freely, also often under pressure or for advantage.
Most people have no idea what is going on most of the time, and often huge things are happening that for long periods no one notices, or brings to general attention.
Then notice these dynamics do not st...