The Nonlinear Library

LW - Contingency: A Conceptual Tool from Evolutionary Biology for Alignment by clem acs


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Contingency: A Conceptual Tool from Evolutionary Biology for Alignment, published by clem acs on June 12, 2023 on LessWrong.
1. Motivation
As AI systems become increasingly powerful, the chance of them developing dangerous features becomes increasingly likely. A key concern of AI alignment researchers is that there are dangerous features, such as power-seeking or situational awareness, which are convergent. This would mean that these capabilities would likely arise regardless of the particular trajectory AI development took. In a recent tweet, Eliezer Yudkowsky argues that this is the key question for AI alignment research:
> From my perspective, we are trying to evaluate a picture of convergent ruin rather than any pictures of contingent ruin. That is, I'm relatively less interested in contingent-doom scenarios like "What if superintelligence is easily alignable, but Greenpeace is first to obtain superintelligence and they align It entirely to protect Earth's environment and then the SI wipes out humanity to do that?" because this is not a case for almost-all paths leading to ruin. I worry that contingent scenarios like "Age of Ultron" can fascinate humans (particularly contingent scenarios where some Other Monkey Faction Gets AGI First oh noes), and derail all possibility of a discussion about whether, in fact, the superintelligence kills everyone regardless of who 'gets' It first.
> Similarly, if somebody is considering only the contingent-early-ruin scenario of "a prototype of a somewhat advanced AGI starts to consider how to wipe out humanity, and nobody spots any signs of that thought", that contingency can mask a convergent-ruin continuation of "even if they spot the early signs, everyone is still in an arms-race scenario, so they optimize away the visible signs of doom and keep going into ruin".
> So I think the key question is "Why would or wouldn't humanity be convergently doomed?" rather than "How might humanity be contingently doomed?" Convergent doom isn't the same as inevitable doom, but ought to be the sort of doom that defeats easy solution / historically characteristic levels of competence for Earth / the sort of efforts we've seen already.
> A doom that can only be averted by an effective multinational coalition shutting down all AI runs larger than GPT-4 for 6 years, to make time for a crash project on human intelligence augmentation that has to complete within 6 years, may not be an inevitable doom; but it was probably a pretty convergent doom if that was your best path for stopping it.
> Conversely, "What if all the AI companies and venture capitalists just decide together to not pursue the rewards of AI models stronger than GPT-4, and high-impact journals and conferences stop accepting papers about it" is a sufficiently unlikely contingency that dooms avertable in that way may still be said to be convergent. It seems reasonably convergent that, if there isn't a drastic effective global shutdown enforced by state-level actors that will enforce that treaty even on non-signatory states, somebody tries.
This framing introduces “contingency” as the opposite of convergence - features of systems which are convergent are, by definition, those which are not contingent. In this piece I suggest an alternative framing, in which convergent features are themselves often contingent on some other feature. This is an initially unintuitive framing which, once you’ve seen it once, comes up everywhere. In the case of alignment, it would allow us to take some dangerous feature of AI systems which we expect to be convergent, and ask “what is the feature contingent on?”. My hope is that this shift in perspective can reveal implicit assumptions and thus allow us to spot new potential solutions for AI alignment.
The field where this shift in perspective is most salient ...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings