Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How LDT helps reduce the AI arms race, published by Tamsin Leake on December 10, 2023 on The AI Alignment Forum.
(Epistemic status: I think this is right?)
Alice is the CEO of ArmaggAI, and Bob is the CEO of BigModelsAI, two major AI capabilities organizations. They're racing to be the first to build a superintelligence aligned to their respective CEV which would take over the universe and satisfy their values.
Alice would prefer to slow down so she has a lot more time to make sure that what her company is building will be aligned; but she's worried about Bob's company taking advantage of that and racing themselves, resulting in Bob's utility function being {what the lightcone is filled with} instead of Alice's; so she feels like she has no choice but to race, to maximize her own utility. Likewise vice-versa.
This state of affairs is far from the pareto frontier of {their utility functions, each weighed by how likely they are to be the one to build an aligned superintelligence}: it causes a lot more worlds where everyone is dead, instead of either Alice's or Bob's utility function being maximized.
But they don't really get all this LDT business; it seems complicated to them. If only they had a convenient lesswrong post spelling out exactly what they should do, in clear easy-to-follow terms!
The procedure they should implement; Alice should:
In worlds where she does launch a superintelligence aligned to itself, and it takes over the world and builds an Alice::utopia, she should tell her AI:
Actually, don't just maximize my own utility function. Maximize a (threat-resistant) mixture of my utility function and Bob's, weighed by what ratio of timelines you think we each launched an aligned superintelligence in, but also weighed by the probability that both me and Bob implement this procedure.
Before that, she should slow down their company's capabilities progress.
And likewise for bob.
Now, their respective aligned superintelligences aren't omniscient; it could be that they over-estimate or under-estimate how likely Bob or Alice would've been to implement this procedure, for example. But one thing's pretty much for sure: neither Alice nor Bob are able to fool the future-superintelligence into thinking that they'd implement this procedure, when actually they wouldn't. Because it's superintelligent.
If Alice wins, she knows Bob will follow the procedure because her superintelligence can tell (better than Bob can fake). And Bob doesn't have to count on wishful thinking to know that Alice would indeed do this instead of defecting, because in worlds where he wins, he can his superintelligence if Alice would implement this procedure. They're each kept in check by the other's future-winning-self, and they can each rely on this being superintelligently-checked by their respective future selves.
So the only way Alice has to get some of her utility maximized in worlds where Bob wins, is to actually behave like this, including before either has launched a superintelligence. And likewise for Bob.
Their incentive gradient is in the direction of being more likely to follow this procedure, including slowing down their capabilities progress - and thus decreasing the amount of worlds where their AI is unaligned and everyone dies forever.
In the real world, there are still Bob's and Alice's who don't implement this procedure, but that's mostly because they don't know/understand that if they did, they would gain more utility. In many cases, it should suffice for them to be informed that this is indeed where their utility lies.
Once someone has demonstrated that they understand how LDT applies here, and that they're generally rational, then they should understand that implementing this protocol (including slowing down AI capabilities) is what maximizes their utility, and so you can count on...