
Sign up to save your podcasts
Or
Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.
Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts.
First, some high-level thoughts on what I want to talk about here:
---
Outline:
(02:31) Why is catastrophic sabotage a big deal?
(02:45) Scenario 1: Sabotage alignment research
(05:01) Necessary capabilities
(06:37) Scenario 2: Sabotage a critical actor
(09:12) Necessary capabilities
(10:51) How do you evaluate a model's capability to do catastrophic sabotage?
(21:46) What can you do to mitigate the risk of catastrophic sabotage?
(23:12) Internal usage restrictions
(25:33) Affirmative safety cases
---
First published:
Source:
Narrated by TYPE III AUDIO.
Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.
Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts.
First, some high-level thoughts on what I want to talk about here:
---
Outline:
(02:31) Why is catastrophic sabotage a big deal?
(02:45) Scenario 1: Sabotage alignment research
(05:01) Necessary capabilities
(06:37) Scenario 2: Sabotage a critical actor
(09:12) Necessary capabilities
(10:51) How do you evaluate a model's capability to do catastrophic sabotage?
(21:46) What can you do to mitigate the risk of catastrophic sabotage?
(23:12) Internal usage restrictions
(25:33) Affirmative safety cases
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,366 Listeners
2,383 Listeners
7,944 Listeners
4,137 Listeners
87 Listeners
1,459 Listeners
9,050 Listeners
88 Listeners
386 Listeners
5,422 Listeners
15,220 Listeners
473 Listeners
120 Listeners
76 Listeners
456 Listeners