
Sign up to save your podcasts
Or


We’ve just released a new paper: Retrying vs Resampling in AI Control. We revisit the resampling protocols introduced in Ctrl-Z with an up-to-date setting and much stronger models, and compare them against “retrying” protocols similar to Claude Code auto mode or Codex Auto-review.
Motivation
Roughly a year ago we released Ctrl-Z, the first paper to study control techniques for agents. A headline result of that paper was the performance of resample protocols – strategies that involve taking multiple i.i.d. samples from the model per step. But since Ctrl-Z, models have gotten much stronger, and we have built more sophisticated control settings to keep up. We wanted to answer the following questions:
How well do the results from Ctrl-Z hold up with better models and a better setting? Current high stakes control research is trying to learn by analogy about how to do control effectively in a real high stakes deployment during a real intelligence explosion. Findings about technique performance1 are going to have to generalize pretty far to be useful.
If the resample protocols from Ctrl-Z still work, what makes them work? One way we try to make our work more generalizable is by understanding the dynamics governing outcomes [...]
---
Outline:
(00:30) Motivation
(02:35) TL;DR Takeaways
(04:34) Methodology
(07:14) Differences from Ctrl-Z
(12:14) Are Retrying Protocols Exploitable?
(15:25) Cost and Latency of Resampling
(17:24) Conclusion
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By Redwood ResearchWe’ve just released a new paper: Retrying vs Resampling in AI Control. We revisit the resampling protocols introduced in Ctrl-Z with an up-to-date setting and much stronger models, and compare them against “retrying” protocols similar to Claude Code auto mode or Codex Auto-review.
Motivation
Roughly a year ago we released Ctrl-Z, the first paper to study control techniques for agents. A headline result of that paper was the performance of resample protocols – strategies that involve taking multiple i.i.d. samples from the model per step. But since Ctrl-Z, models have gotten much stronger, and we have built more sophisticated control settings to keep up. We wanted to answer the following questions:
How well do the results from Ctrl-Z hold up with better models and a better setting? Current high stakes control research is trying to learn by analogy about how to do control effectively in a real high stakes deployment during a real intelligence explosion. Findings about technique performance1 are going to have to generalize pretty far to be useful.
If the resample protocols from Ctrl-Z still work, what makes them work? One way we try to make our work more generalizable is by understanding the dynamics governing outcomes [...]
---
Outline:
(00:30) Motivation
(02:35) TL;DR Takeaways
(04:34) Methodology
(07:14) Differences from Ctrl-Z
(12:14) Are Retrying Protocols Exploitable?
(15:25) Cost and Latency of Resampling
(17:24) Conclusion
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.