August 25, 2023

AF - A Model-based Approach to AI Existential Risk by Samuel Dylan Martin

52 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Model-based Approach to AI Existential Risk, published by Samuel Dylan Martin on August 25, 2023 on The AI Alignment Forum.

Introduction

Polarisation hampers cooperation and progress towards understanding whether future AI poses an existential risk to humanity and how to reduce the risks of catastrophic outcomes. It is exceptionally challenging to pin down what these risks are and what decisions are best. We believe that a model-based approach offers many advantages for improving our understanding of risks from AI, estimating the value of mitigation policies, and fostering communication between people on different sides of AI risk arguments. We also believe that a large percentage of practitioners in the AI safety and alignment communities have appropriate skill sets to successfully use model-based approaches.

In this article, we will lead you through an example application of a model-based approach for the risk of an existential catastrophe from unaligned AI: a probabilistic model based on Carlsmith's Is Power-seeking AI an Existential Risk? You will interact with our model, explore your own assumptions, and (we hope) develop your own ideas for how this type of approach might be relevant in your own work. You can find a link to the model here.

In many poorly understood areas, people gravitate to advocacy positions. We see this with AI risk, where it is common to see writers dismissively call someone an "AI doomer", or "AI accelerationist". People on each side of this debate are unable to communicate their ideas to the other side, since advocacy often includes biases and evidence interpreted within a framework not shared by the other side.

In other domains, we have witnessed first-hand that model-based approaches are a constructive way to cut through advocacy like this. For example, by leveraging a model-based approach, the Rigs-to-Reefs project reached near consensus among 22 diverse organisations on the contentious problem of how to decommission the huge oil platforms off the Santa Barbara coast. For decades, environmental groups, oil companies, marine biologists, commercial and recreational fishermen, shipping interests, legal defence funds, the State of California, and federal agencies were stuck in an impasse on this issue. The introduction of a model refocused the dialog on specific assumptions, objectives and options, and led to 20 out of the 22 organisations agreeing on the same plan. The California legislature encoded this plan into law with bill AB 2503, which passed almost unanimously.

There is a lot of uncertainty around existential risks from AI, and the stakes are extremely high. In situations like this, we advocate quantifying uncertainty explicitly using probability distributions. Sadly, this is not as common as it should be, even in domains where such techniques would be most useful.

A recent paper on the risks of unaligned AI by Joe Carlsmith (2022) is a powerful illustration of how probabilistic methods can help assess whether advanced AI poses an existential risk to humanity. In this article, we review Carlsmith's argument and incorporate his problem decomposition into our own Analytica model. We then expand on this starting point in several ways to demonstrate elementary ways to approach each of the distinctive challenges in the x-risk domain. We take you on a tour of the live model to learn about its elements and enable you to dive deeper on your own.

Challenges

Predicting the long-term future is always challenging. The difficulty is amplified when there is no historical precedent. But this challenge is not unique; we lack historical precedent in many other areas, for example when considering a novel government program or a fundamentally new business initiative. We also lack precedent when world conditions change due to changes in technology, ...

...more