The Nonlinear Library

LW - AI Safety via Luck by Jozdien


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety via Luck, published by Jozdien on April 1, 2023 on LessWrong.
Epistemic Status: I feel confident and tentatively optimistic about the claims made in this post, but am slightly more uncertain about how it generalizes. Additionally, I am concerned about the extent to which this is dual-use for capabilities and exfohazardous and spent a few months thinking about whether it was worth it to release this post regardless. I haven’t come to an answer yet, so I’m publishing this to let other people see it and know what they think I should do.
TL;DR: I propose a research direction to solve alignment that potentially doesn’t require solutions to ontology identification, learning how to code, or becoming literate.
Introduction
Until a few hours ago, I was spending my time primarily working on high-level interpretability and cyborgism. While I was writing a draft for something I was working on, an activity that usually yields me a lot of free time by way of procrastination, I stumbled across the central idea behind many of the ideas in this post. It seemed so immediately compelling that I dropped working on everything else to start working on it, culminating after much deliberation in the post you see before you.
My intention with this post is to provide a definitive reference for what it would take to safely use AGI to steer our world toward much better states in the absence of a solution to any or all of several existing problems, such as Eliciting Latent Knowledge, conditioning simulator models, Natural Abstractions, mechanistic interpretability, and the like.
In a world with prospects such as those, I propose that we radically rethink our approach to AGI safety. Instead of dedicating enormous effort to engineering nigh-impossible safety measures, we should consider thus-far neglected avenues of research, especially ones that have memetic reasons to be unfairly disprivileged so far and which immunizes them against capabilities misuse. To avert the impending AI apocalypse, we need to focus on high-variance, low-probability-high-yield ideas: lightning strikes that, should they occur, effectively solve astoundingly complex problems in a single fell swoop. A notable example of this, which I claim we should be investing all of our efforts into, is luck. Yes, luck!
Luck As A Viable Strategy
I suggest that we should pay greater attention to luck as a powerful factor enhancing other endeavors and as an independent direction in its own right. Humanity has, over the centuries, devoted immense amounts of cumulative cognition toward exploring and optimizing for luck, so one might naively think that there’s little tractability left. I believe, however, that there is an immense amount of alpha that has been developed in the form of contemporary rationality and cultural devices that can vastly improve the efficiency of steering luck, and at a highly specified target.
Consider the following: if we were to offer a $1,000,000 prize to the next person who walks into the MIRI offices, clearly, that person would be the luckiest person on the planet. It follows, then, that this lucky individual would have an uncannily high probability of finally cracking the alignment problem. I understand that prima facie this proposal may be considered absurd, but I strongly suggest abandoning the representativeness heuristic and evaluating what is instead of what seems to be, especially given that the initial absurdness is intrinsic to why this strategy is competitive at all.
It's like being granted three wishes by a genie. Instead of wishing for more wishes (which is the usual strategy), we should wish to be the luckiest person in the world—with that power, we can then stumble upon AGI alignment almost effortlessly, and make our own genies.
Think of it this way: throughout history, many great discover...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings