
Sign up to save your podcasts
Or


Misaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongMisaligned AIs might engage in research sabotage: making safety research go poorly by doing things like withholding their best ideas or putting subtle bugs in experiments. To mitigate this risk with AI control, we need very different techniques than we used in the original control paper or Ctrl-Z, because research sabotage differs from those settings in two crucial ways:
---
Outline:
(02:53) Threat model and basic countermeasures
(03:31) Deployer strategies: Training and incrimination
(07:51) Sabotage strategies
(08:11) Targeting sabotage at the most effective points
(08:48) Withholding good content vs. producing bad content
(10:57) Code vs. ideas
(11:33) What terms should we use for the concentrated/diffuse spectrum?
(14:08) Canary string
(14:23) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,334 Listeners

2,457 Listeners

8,555 Listeners

4,172 Listeners

97 Listeners

1,604 Listeners

10,017 Listeners

97 Listeners

522 Listeners

5,524 Listeners

15,995 Listeners

558 Listeners

133 Listeners

93 Listeners

472 Listeners