January 30, 2025

“Takeaways from sketching a control safety case” by Josh Clymer

7 minutes

Subtitle: Insights from a long technical paper compressed into a fun little commentary.

Buck Shlegeris and I recently published a paper with UK AISI that sketches a safety case for “AI control” – measures that improve safety despite intentional subversion from AI systems.

I would summarize this work as “turning crayon drawings of safety cases into blueprints.”

It's a long and technical paper! So I’ve compiled my main takeaways in this blogpost.

The main figure from the paper summarizes how to run a control evaluation.

1. Control evaluations are way more complicated than I expected.

At a high level, control evaluations are conceptually simple. A red team constructs substitutes for subversive models. These models try to cause proxy bad outcomes (e.g. exfiltrate dummy data). Then, developers see whether the red team can succeed. If the red team succeeds, then control measures aren’t effective enough.

But there's so much [...]

---

First published:

January 30th, 2025

Source:

https://redwoodresearch.substack.com/p/takeaways-from-sketching-a-control

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By Redwood Research

January 30, 2025

“Takeaways from sketching a control safety case” by Josh Clymer

7 minutes

Subtitle: Insights from a long technical paper compressed into a fun little commentary.

Buck Shlegeris and I recently published a paper with UK AISI that sketches a safety case for “AI control” – measures that improve safety despite intentional subversion from AI systems.

I would summarize this work as “turning crayon drawings of safety cases into blueprints.”

It's a long and technical paper! So I’ve compiled my main takeaways in this blogpost.

The main figure from the paper summarizes how to run a control evaluation.

1. Control evaluations are way more complicated than I expected.

But there's so much [...]

---

First published:

January 30th, 2025

Source:

https://redwoodresearch.substack.com/p/takeaways-from-sketching-a-control

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

Share “Takeaways from sketching a control safety case” by Josh Clymer

Sign up to save your podcasts

“Takeaways from sketching a control safety case” by Josh Clymer

“Takeaways from sketching a control safety case” by Josh Clymer