Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ideas for improving epistemics in AI safety outreach, published by mic on August 21, 2023 on LessWrong.
In 2022 and 2023, there has been a growing focus on recruiting talented individuals to work on mitigating the potential existential risks posed by artificial intelligence. For example, we've seen an increase in the number of university clubs, retreats, and workshops dedicated to introducing people to the issue of existential risk from AI.
However, these efforts might foster an environment with suboptimal epistemics. Given the goal of enabling people to contribute positively to AI safety, there's an incentive to focus on that without worrying as much about whether our arguments are solid. Many people working on field building are not domain experts in AI safety or machine learning but are motivated due to a belief that AI safety is an important issue. Some participants may hold the belief that addressing the risks associated with AI is important, without fully comprehending their reasoning behind this belief or having engaged with strong counterarguments.
This post is a brief examination of this issue and suggests some ideas to improve epistemics in outreach efforts.
Note: I first drafted this in December 2022. Since then, concern about AI x-risk has been increasingly discussed in the mainstream, so AI safety field builders should hopefully be using fewer weird, epistemically poor arguments. Still, I think epistemics are still relevant to discuss after a recent post noted poor epistemics in EA community building.
What are some ways that AI safety field building may be epistemically unhealthy?
Organizers may promote arguments for AI safety that may be (comparatively) compelling yet flawed
Advancing arguments promoting the importance of AI safety while neglecting opposing arguments
E.g., citing that x% of researchers believe that AI has an y% chance of causing an existential catastrophe, without the caveat that experts have widely differing views
Confidently making arguments that are flawed or have insufficiently justified premises
E.g., claiming that instrumental convergence is inevitable, assuming that AIs are maximizing for reward (see Reward is not the optimization target, although there are also comments disagreeing with this)
See also: Rohin Shah's comment here about how few people can make an argument for working on AI x-risk that he doesn't think is obviously flawed
Simultaneously, I think that most ML people don't find AI safety arguments particularly compelling.
It's easy to form the perception that arguments in favor of AI safety are "supposed" to be the more correct ones. People might feel hesitant to voice disagreements.
In a reading group (such as one based on AI Safety Fundamentals), people may go along with the arguments from the readings or what the discussion facilitator says - deferring to authority and being hesitant to think through arguments themselves.
People may participate in reading groups but skim the readings, and walk away with a belief in the conclusions without understanding the arguments; or notice they are confused but walk away regardless believing the conclusions.
Why are good epistemics valuable?
To do productive research, we want to avoid having an understanding of AI x-risk that is obviously flawed
"incorrect arguments lead to incorrect beliefs which lead to useless solutions" (from Rohin Shah)
Bad arguments are bad for persuading people (or at least, it seems bad if you can't anticipate common objections from the ML community)
People making bad arguments is bad for getting people to do useful work
Attract more people with good epistemics
For the sake of epistemic rigor, I'll also make a few possible arguments about why epistemics may be overrated.
Perhaps people can do useful work even if they don't have an inside view of why AI ...