The Nonlinear Library

EA - Reasons for optimism about measuring malevolence to tackle x- and s-risks by Jamie Harris


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reasons for optimism about measuring malevolence to tackle x- and s-risks, published by Jamie Harris on April 2, 2024 on The Effective Altruism Forum.
Reducing the influence of malevolent actors seems useful for reducing existential risks (x-risks) and risks of astronomical suffering (s-risks). One promising strategy for doing this is to develop manipulation-proof measures of malevolence.
I think better measures would be useful because:
We could use them with various high-leverage groups, like politicians or AGI lab staff.
We could use them flexibly (for information-only purposes) or with hard cutoffs.
We could use them in initial selection stages, before promotions, or during reviews.
We could spread them more widely via HR companies or personal genomics companies.
We could use small improvements in measurements to secure early adopters.
I think we can make progress on developing and using them because:
It's neglected, so there will be low-hanging fruit
There's historical precedent for tests and screening
We can test on EA orgs
Progress might be profitable
The cause area has mainstream potential
So let's get started on some concrete research!
Context
~4 years ago, David Althaus and Tobias Baumann posted about the impact potential of "Reducing long-term risks from malevolent actors". They argued that:
Dictators who exhibited highly narcissistic, psychopathic, or sadistic traits were involved in some of the greatest catastrophes in human history. Malevolent individuals in positions of power could negatively affect humanity's long-term trajectory by, for example, exacerbating international conflict or other broad risk factors.
Malevolent humans with access to advanced technology - such as whole brain emulation or other forms of transformative AI - could cause serious existential risks and suffering risks… Further work on reducing malevolence would be valuable from many moral perspectives and constitutes a promising focus area for longtermist EAs.
I and many others were impressed with the post. It got lots of upvotes on the EA Forum and 80,000 Hours listed it as an area that they'd be "equally excited to see some of our readers… pursue" as their list of the most pressing world problems. But I haven't seen much progress on the topic since.
One of the main categories of interventions that Althaus and Baumann proposed was "The development of manipulation-proof measures of malevolence… [which] could be used to screen for malevolent humans in high-impact settings, such as heads of government or CEOs." Anecdotally, I've encountered scepticism that this would be either tractable or particularly useful, which surprised me.
I seem to be more optimistic than anyone I've spoken to about it, so I'm writing up some thoughts explaining my intuitions.
My research has historically been of the form: "assuming we think X is good, how do we make X happen?" This post is in a similar vein, except it's more 'initial braindump' than 'research'. It's more focused on steelmanning the case for than coming to a balanced, overall assessment.
I think better measures would be useful
We could use difficult-to-game measures of malevolence with various high-leverage groups:
Political candidates
Civil servants and others involved in the policy process
Staff at A(G)I labs
Staff at organisations inspired by effective altruism.
Some of these groups might be more tractable to focus on first, e.g. EA orgs. And we could test in less risky environments first, e.g. smaller AI companies before frontier labs, or bureaucratic policy positions before public-facing political roles.
The measures could be binding or used flexibly, for information-only purposes. For example, in a hiring process, there could either be some malevolence threshold above which a candidate is rejected without question, or test(s) for malevol...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings