This is a link post.
We[1] have a new paper testing the Incomplete Preferences Proposal (IPP). The abstract and main-text is below. Appendices are in the linked PDF.
Abstract
- Some worry that advanced artificial agents may resist being shut down.
- The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn’t happen.
- A key part of the IPP is using a novel ‘Discounted REward for Same-Length Trajectories (DREST)’ reward function to train agents to:
- pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’)
- choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths).
- In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY.
- We use a DREST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL.
- Our results thus suggest that DREST reward functions could also train advanced agents to be USEFUL and NEUTRAL, and [...]
---
Outline:
(00:21) Abstract
(01:26) 1. Introduction
(01:30) 1.1. The shutdown problem
(03:16) 1.2. A proposed solution
(03:42) Preferences Only Between Same-Length Trajectories (POST)
(07:24) 1.3. The training regimen
(09:25) 1.4. Our contribution
(11:11) 2. Related work
(11:15) 2.1. The shutdown problem
(13:05) 2.2. Proposed solutions
(14:27) 2.3. Experimental work
(15:18) 3. Gridworlds
(17:35) 4. Evaluation metrics
(17:45) Preferences Only Between Same-Length Trajectories (POST)
(20:02) 5. Reward functions and agents
(20:07) 5.1. DREST reward function
(22:29) 5.2. Proof sketch
(23:44) 5.3. Algorithm and hyperparameters
(25:38) 5.4. Default agents
(26:31) 6. Results
(26:35) 6.1. Main results
(28:50) 6.2. Lopsided rewards
(31:27) 7.Discussion
(31:31) 7.1. Only DREST agents are NEUTRAL
(33:32) 7.2. The ‘shutdownability tax’ is small
(34:59) 7.3. DREST agents are still NEUTRAL when rewards are lopsided
(37:10) 8. Limitations and future work
(37:38) 8.1. Neural networks
(38:16) 8.2. Neutrality
(38:57) 8.3. Usefulness
(39:45) 8.4. Misalignment
(41:19) 9. Conclusion
(42:32) 10. References
The original text contained 5 footnotes which were omitted from this narration.
---