Redwood Research Blog

“Takeaways from sketching a control safety case” by Josh Clymer


Listen Later

Subtitle: Insights from a long technical paper compressed into a fun little commentary.

Buck Shlegeris and I recently published a paper with UK AISI that sketches a safety case for “AI control” – measures that improve safety despite intentional subversion from AI systems.

I would summarize this work as “turning crayon drawings of safety cases into blueprints.”

It's a long and technical paper! So I’ve compiled my main takeaways in this blogpost.

The main figure from the paper summarizes how to run a control evaluation.

1. Control evaluations are way more complicated than I expected.

At a high level, control evaluations are conceptually simple. A red team constructs substitutes for subversive models. These models try to cause proxy bad outcomes (e.g. exfiltrate dummy data). Then, developers see whether the red team can succeed. If the red team succeeds, then control measures aren’t effective enough.

But there's so much [...]

---

First published:

January 30th, 2025

Source:

https://redwoodresearch.substack.com/p/takeaways-from-sketching-a-control

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

Redwood Research BlogBy Redwood Research