Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: «Boundaries» and AI safety compilation, published by Chipmonk on May 3, 2023 on LessWrong.
In this post I outline every post I could find that meaningfully connects the concept of «Boundaries» with AI safety. This seems to be a booming subtopic: interest has picked up substantially within the past year.
Perhaps most notably, Davidad includes the concept in his Open Agency Architecture for Safe Transformative AI alignment paradigm. For a preview of the salience of this approach, see this comment by Davidad (2023 Jan):
“defend the boundaries of existing sentient beings,” which is my current favourite. It’s nowhere near as ambitious or idiosyncratic as “human values”, yet nowhere near as anti-natural or buck-passing as corrigibility.
This post also compiles recent work from Andrew Critch, Scott Garrabrant, John Wentworth, and others. But first I will recap what «Boundaries» are:
«Boundaries» definition recap:
You can see «Boundaries» Sequence for a longer explanation, but I will excerpt from a more recent post by Andrew Critch, 2023 March:
By boundaries, I just mean the approximate causal separation of regions in some kind of physical space (e.g., spacetime) or abstract space (e.g., cyberspace). Here are some examples from my «Boundaries» Sequence:
a cell membrane (separates the inside of a cell from the outside);
a person's skin (separates the inside of their body from the outside);
a fence around a family's yard (separates the family's place of living-together from neighbors and others);
a digital firewall around a local area network (separates the LAN and its users from the rest of the internet);
a sustained disassociation of social groups (separates the two groups from each other)
a national border (separates a state from neighboring states or international waters).
Also, beware:
When I say boundary, I don't just mean an arbitrary constraint or social norm.
Posts & researchers that link «Boundaries» and AI safety
All bolding in the excerpts below is mine.
Davidad’s OAA
Saliently, Davidad uses «Boundaries» for one of the four hypotheses he outlines in An Open Agency Architecture for Safe Transformative AI (2022 Dec)
All Principles That Human CEV Would Endorse⇒Q⇒Don't Kill Everyone
Deontic Sufficiency Hypothesis: There exists a human-understandable set of features of finite trajectories in such a world-model, taking values in (−∞,0], such that we can be reasonably confident that all these features being near 0 implies high probability of existential safety, and such that saturating them at 0 is feasible[2] with high probability, using scientifically-accessible technologies.
I am optimistic about this largely because of recent progress toward formalizing a natural abstraction of boundaries by Critch and Garrabrant. I find it quite plausible that there is some natural abstraction property Q of world-model trajectories that lies somewhere strictly within the vast moral gulf of
Further explanation of this can be found in Davidad's Bold Plan for Alignment: An In-Depth Explanation (2023 Apr) by Charbel-Raphaël and Gabin:
Getting traction on the deontic feasibility hypothesis
Davidad believes that using formalisms such as Markov Blankets would be crucial in encoding the desiderata that the AI should not cross boundary lines at various levels of the world-model. We only need to “imply high probability of existential safety”, so according to davidad, “we do not need to load much ethics or aesthetics in order to satisfy this claim (e.g. we probably do not get to use OAA to make sure people don't die of cancer, because cancer takes place inside the Markov Blanket, and that would conflict with boundary preservation; but it would work to make sure people don't die of violence or pandemics)”. Discussing this hypothesis more thoroughly seems important.
Also see:
() Elicitors: Langua...