June 03, 2024

“AI catastrophes and rogue deployments” by Buck

14 minutes

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Thanks to Aryan Bhatt, Ansh Radhakrishnan, Adam Kaufman, Vivek Hebbar, Hanna Gabor, Justis Mills, Aaron Scher, Max Nadeau, Ryan Greenblatt, Peter Barnett, Fabien Roger, and various people at a presentation of these arguments for comments. These ideas aren’t very original to me; many of the examples of threat models are from other people.]

In this post, I want to introduce the concept of a “rogue deployment” and argue that it's interesting to classify possible AI catastrophes based on whether or not they involve a rogue deployment. I’ll also talk about how this division interacts with the structure of a safety case, discuss two important subcategories of rogue deployment, and make a few points about how the different categories I describe here might be caused by different attackers (e.g. the AI itself, rogue lab insiders, external hackers, or [...]

---

Outline:

(01:01) Rogue deployments

(04:01) Rogue deployments and safety cases

(05:10) More on catastrophes with rogue deployment

(10:04) More on catastrophes without rogue deployment

(13:03) Different attacker profiles

(13:46) Hot takes that I can say using the above concepts

---

First published:

June 3rd, 2024

Source: