
Sign up to save your podcasts
Or
One of our main activities at Redwood is writing follow-ups to previous papers on control like the original and Ctrl-Z, where we construct a setting with a bunch of tasks (e.g. APPS problems) and a notion of safety failure (e.g. backdoors according to our specific definition), then play the adversarial game where we develop protocols and attacks on those protocols.
It turns out that a substantial fraction of the difficulty here is developing the settings, rather than developing the protocols and attacks. I want to explain why this is the case.
In order to do high-stakes control research, you need a dataset of tasks with the following properties:
Here's why it's challenging [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
One of our main activities at Redwood is writing follow-ups to previous papers on control like the original and Ctrl-Z, where we construct a setting with a bunch of tasks (e.g. APPS problems) and a notion of safety failure (e.g. backdoors according to our specific definition), then play the adversarial game where we develop protocols and attacks on those protocols.
It turns out that a substantial fraction of the difficulty here is developing the settings, rather than developing the protocols and attacks. I want to explain why this is the case.
In order to do high-stakes control research, you need a dataset of tasks with the following properties:
Here's why it's challenging [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
26,469 Listeners
2,395 Listeners
7,928 Listeners
4,142 Listeners
89 Listeners
1,472 Listeners
9,207 Listeners
88 Listeners
417 Listeners
5,448 Listeners
15,237 Listeners
481 Listeners
121 Listeners
75 Listeners
461 Listeners