The Nonlinear Library: Alignment Forum

AF - Some biases and selection effects in AI risk discourse by Tamsin Leake


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some biases and selection effects in AI risk discourse, published by Tamsin Leake on December 12, 2023 on The AI Alignment Forum.
These are some selection effects impacting what ideas people tend to get exposed to and what they'll end up believing, in ways that make the overall epistemics worse. These have mostly occured to me about AI discourse (alignment research, governance, etc), mostly on LessWrong. (They might not be exclusive to discourse on AI risk.)
Confusion about the problem often leads to useless research
People walk into AI discourse, and they have various confusions, such as:
What are human values?
Aligned to whom?
What does it mean for something to be an optimizer?
Okay, unaligned ASI would kill everyone, but how?
What about multipolar scenarios?
What counts as AGI, and when do we achieve that?
Those questions about the problem do not particularly need fancy research to be resolved; they're either already solved or there's a good reason why thinking about them is not useful to the solution. For these examples:
What are human values?
We don't need to figure out this problem, we can just implement CEV without ever having a good model of what "human values" are.
Aligned to whom?
The vast majority of the utility you have to gain is from {getting a utopia rather than everyone-dying-forever}, rather than {making sure you get the right utopia}.
What does it mean for something to be an optimizer?
Expected utility maximization seems to fully cover this. More general models aren't particularly useful to saving the world.
Okay, unaligned ASI would kill everyone, but how?
This does not particularly matter. If there is unaligned ASI, we just die, the way AI now just wins at chess; this is the only part that particularly matters.
What about multipolar scenarios?
They do a value-handshake and kill everyone together.
What counts as AGI, and when do we achieve that?
People keep mentioning definitions of AGI such as "when 99% of currently fully remote jobs will be automatable" or "for almost all economically relevant cognitive tasks, at least matches any human's ability at the task".
I do not think such definitions are useful, because I don't think these things are particularly related to how-likely/when AI will kill everyone. I think AI kills everyone before observing the event in either of those quotes - and even if it didn't, having passed those events doesn't particularly impact when AI will kill everyone. I usually talk about timelines until decisive strategic advantage (aka AI takes over the world) takes over, because that's what matters.
"AGI" should probably just be tabood at this point.
These answers (or reasons-why-answering-is-not-useful) usually make sense if you're familiar with rationality and alignment, but some people are still missing a lot of the basics of rationality and alignment, and by repeatedly voicing these confusions they cause people to think that those confusions are relevant and should be researched, causing lots of wasted time.
It should also be noted that some things are correct to be confused about. If you're researching a correlation or concept-generalization which doesn't actually exist in the territory, you're bound to get pretty confused! If you notice you're confused, ask yourself whether the question is even coherent/true, and ask yourself whether figuring it out helps save the world.
Arguments about P(doom) are filtered for nonhazardousness
Some of the best arguments for high P(doom) / short timelines that someone could make would look like this:
It's not that hard to build an AI that kills everyone: you just need to solve [some problems] and combine the solutions. Considering how easy it is compared to what you thought, you should increase your P(doom) / shorten your timelines.
But obviously, if people had arguments of this shape,...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners