
Sign up to save your podcasts
Or


Subtitle: A new control setting for more realistic software engineering deployments.
We are releasing LinuxArena, a new control setting comprised of 20 software engineering environments. Each environment consists of a set of SWE tasks, a set of possible safety failures, and a set of covert sabotage trajectories that cause these safety failures. LinuxArena can be used to evaluate the risk of AI deployments by assessing AI's sabotage capabilities, assessing control mitigations such as monitoring, and developing novel control techniques. This post will cover what LinuxArena is and the basics of its use cases.
I’m excited about LinuxArena because it is a much more complex control setting with many moving parts and enables a lot of new control research. It was already used in the Claude Mythos Preview System Card to measure Mythos's sabotage ability. We are happy to help AI companies or other organizations set up LinuxArena for their own evaluations or research!
LinuxArena
AI companies are increasingly deploying coding agents on real codebases with running services, databases, and complex permissions. But the control settings we use to evaluate whether these deployments are safe are far simpler than the deployments themselves. The APPS backdoor setting is a single-turn [...]
---
Outline:
(01:10) LinuxArena
(03:28) LinuxArena as a sabotage risk evaluation
(03:53) Sabotage evals
(05:35) Monitor evals
(06:35) Control Evals
(08:52) Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By Redwood ResearchSubtitle: A new control setting for more realistic software engineering deployments.
We are releasing LinuxArena, a new control setting comprised of 20 software engineering environments. Each environment consists of a set of SWE tasks, a set of possible safety failures, and a set of covert sabotage trajectories that cause these safety failures. LinuxArena can be used to evaluate the risk of AI deployments by assessing AI's sabotage capabilities, assessing control mitigations such as monitoring, and developing novel control techniques. This post will cover what LinuxArena is and the basics of its use cases.
I’m excited about LinuxArena because it is a much more complex control setting with many moving parts and enables a lot of new control research. It was already used in the Claude Mythos Preview System Card to measure Mythos's sabotage ability. We are happy to help AI companies or other organizations set up LinuxArena for their own evaluations or research!
LinuxArena
AI companies are increasingly deploying coding agents on real codebases with running services, databases, and complex permissions. But the control settings we use to evaluate whether these deployments are safe are far simpler than the deployments themselves. The APPS backdoor setting is a single-turn [...]
---
Outline:
(01:10) LinuxArena
(03:28) LinuxArena as a sabotage risk evaluation
(03:53) Sabotage evals
(05:35) Monitor evals
(06:35) Control Evals
(08:52) Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.