
Sign up to save your podcasts
Or


Reference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongReference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

26,388 Listeners

2,424 Listeners

8,267 Listeners

4,145 Listeners

92 Listeners

1,565 Listeners

9,826 Listeners

89 Listeners

488 Listeners

5,475 Listeners

16,083 Listeners

534 Listeners

133 Listeners

96 Listeners

509 Listeners