
Sign up to save your podcasts
Or
Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,367 Listeners
2,397 Listeners
7,779 Listeners
4,103 Listeners
87 Listeners
1,442 Listeners
8,778 Listeners
89 Listeners
355 Listeners
5,370 Listeners
15,053 Listeners
460 Listeners
126 Listeners
64 Listeners
432 Listeners