
Sign up to save your podcasts
Or
Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,327 Listeners
2,383 Listeners
8,010 Listeners
4,127 Listeners
90 Listeners
1,469 Listeners
9,200 Listeners
91 Listeners
425 Listeners
5,434 Listeners
15,371 Listeners
502 Listeners
128 Listeners
72 Listeners
464 Listeners