The Nonlinear Library

AF - An Exercise to Build Intuitions on AGI Risk by Lauro Langosco


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Exercise to Build Intuitions on AGI Risk, published by Lauro Langosco on June 7, 2023 on The AI Alignment Forum.
Epistemic status: confident that the underlying idea is useful; less confident about the details, though they're straightforward enough that I expect they're mostly in the right direction.
TLDR: This post describes a pre-mortem-like exercise that I find useful for thinking about AGI risk. It is the only way I know of to train big-picture intuitions about what solution attempts are more or less promising and what the hard parts of the problem are. The (simple) idea is to iterate between constructing safety proposals ('builder step') and looking for critical flaws in a proposal ('breaker step').
Introduction
The way that scientists-in-training usually develop research taste is to smash their heads against reality until they have good intuitions about things like which methods tend to work, how to interpret experimental results, or when to trust their proof of a theorem. This important feedback loop is mostly absent in AGI safety research, since we study a technology that does not exist yet (AGI). As a result, it is hard to develop a good understanding of which avenues of research are most promising and what the hard bits of the problem even are.
The best way I know of to approximate that feedback loop is an iterative exercise with two steps: 1) propose a solution to AGI safety, and 2) look for flaws in the proposal. The idea is simple, but most people don’t do it explicitly or don’t do it often enough.
Multiple rounds of this exercise tend to bring up details about one’s assumptions and predictions that would otherwise stay implicit or unnoticed. Writing down specific flaws of a specific proposal helps ground more general concepts like instrumental convergence or claims like ‘corrigibility is unnatural’. And after some time, the patterns in the flaws (the ‘hard bits’) become visible on their own.
I ran an earlier version of this exercise as a workshop (an important component is to discuss your ideas with others, so a workshop format is convenient). Here are the slides.
The exercise
The exercise consists of two phases: a builder phase in which you write down a best guess / proposal for how we might avoid existential risk from AGI, and a breaker phase in which you dig into the details until you understand how the proposal fails.
Importantly, in the context of this exercise the only thing that counts is your own inside view, that is your own understanding of the technical or political feasibility of the proposal. You might have thoughts like “There’s smart people who have thought about this much longer than I have, and they think X; why should I disagree?”. Put that aside for now; the point is to develop your own views, and that works best when you don’t think too much about other people’s views except to inform your own thoughts.
Builder phase
Write down the proposal: a plausible story for how we might avoid human extinction or disempowerment due to AGI. It doesn’t need to be very detailed yet; that comes in the breaker phase.
The proposal will look very different depending on the assumptions you’re starting with.
If you’re relatively optimistic about AGI risk, the proposal might look very much like things continuing to go on the current trajectory. Write down how you expect the future to go in very broad terms: if we build AGI, how do we know it’ll behave like we want it to? If we don’t build AGI, how come?
If you’re more pessimistic: what’s the most plausible way in which we could move things from their current trajectory? Is it plausible AI labs can coordinate to not build AGI? Or is there a technical solution that seems promising?
The builder phase is complete when you have a proposal (can be as short as a paragraph) that seems to you like it stands a d...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings