January 24, 2025

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

24 minutes

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thorough argument that the level of (existential) risk caused by the company is very low; perhaps they would require that the total chance of existential risk over the lifetime of the AI company[2] is less than 0.25%[3][4].

The idea of making high-assurance safety cases (once AI systems are dangerously powerful) is popular in some parts of the AI safety community and a variety of work appears to focus on this. Further, Anthropic has expressed an intention (in their RSP) to "keep risks below acceptable levels"[5] and there is a common impression that Anthropic would pause [...]

---

Outline:

(03:19) Why are companies unlikely to succeed at making high-assurance safety cases in short timelines?

(04:14) Ensuring sufficient security is very difficult

(04:55) Sufficiently mitigating scheming risk is unlikely

(09:35) Accelerating safety and security with earlier AIs seems insufficient

(11:58) Other points

(14:07) Companies likely wont unilaterally slow down if they are unable to make high-assurance safety cases

(18:26) Could coordination or government action result in high-assurance safety cases?

(19:55) What about safety cases aiming at a higher risk threshold?

(21:57) Implications and conclusions

The original text contained 20 footnotes which were omitted from this narration.

---

First published:
January 23rd, 2025

Source:
https://www.lesswrong.com/posts/neTbrpBziAsTH5Bn7/ai-companies-are-unlikely-to-make-high-assurance-safety

---

Narrated by TYPE III AUDIO.

...more