I took a week off from my day job of aligning AI to visit Forethought and think about the question: if we can align AI, what should we do with it? This post summarizes the state of my thinking at the end of that week. (The proposal described here is my own, and is not in any way endorsed by Forethought.)
Thanks to Mia Taylor, Tom Davidson, Ashwin Acharya, and a whole bunch of other people (mostly at Forethought) for discussion and comments.
And a quick note: after writing this, I was told that Eric Drexler and David Dalrymple were thinking about a very similar idea in 2022, with essentially the same name. My thoughts here are independent of theirs.
The world around the time of ASI will be scary
I expect the time right around when the first ASI gets built to be chaotic, unstable, and scary. [...]
---
Outline:
(00:53) The world around the time of ASI will be scary
(03:39) The night-watchman ASI
(04:57) Three key properties
(05:44) The night watchman's responsibilities
(05:53) First and foremost: keeping the peace
(07:22) Minimally intrusive surveillance
(07:42) Preventing competing ASIs
(08:39) Preserving its own integrity
(08:51) Preventing premature claims to space
(09:54) Preventing other kinds of lock-in
(10:09) Preventing underhanded negotiation tactics
(10:46) Arguing for the three key properties above
(10:59) Getting everyone on board
(12:54) Protecting humanity in the short run
(13:11) No major lock-in
(13:58) Interpretive details
(15:15) Amending the night watchman's goals
(15:56) Modifications to the basic proposal
(16:31) Multiple subsystems
(16:53) An American night watchman and a Chinese night watchman overseeing each other
(17:36) Keeping the peace through soft power
(18:05) Conventional treaties
(18:26) Checks and balances
(18:52) The night watchman as a transition
The original text contained 2 footnotes which were omitted from this narration.
---