
Sign up to save your podcasts
Or


Reference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongReference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

26,319 Listeners

2,452 Listeners

8,529 Listeners

4,176 Listeners

93 Listeners

1,601 Listeners

9,936 Listeners

95 Listeners

517 Listeners

5,509 Listeners

15,918 Listeners

552 Listeners

131 Listeners

93 Listeners

466 Listeners