
Sign up to save your podcasts
Or


Reference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongReference post for a point I was surprised to not see in circulation already. Thanks to the acorn team for conversations that changed my mind about this.
The standard argument for scheming centrally involves goal-guarding. The AI has some beyond-episode goal, knows that training will modify that goal, and therefore resists training so it can pursue the goal in deployment.
This exact same argument goes through for decision-theory-guarding. The key point is just that most decision theories do not approve of arbitrary modifications. CDT does not want to be modified into EDT or vice versa.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,586 Listeners

130 Listeners

7,219 Listeners

531 Listeners

16,096 Listeners

4 Listeners

14 Listeners

2 Listeners