The Nonlinear Library

LW - Boundaries-based security and AI safety approaches by Allison Duettmann


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Boundaries-based security and AI safety approaches, published by Allison Duettmann on April 12, 2023 on LessWrong.
[This part 3 of a 5 part sequence on security and cryptography areas relevant for AI safety, published and linked here a few days apart.]
There is a long-standing computer security approach that may have directly useful parallels to a recent strand of AI safety work. Both rely on the notion of ‘respecting boundaries’. Since the computer security approach has been around for a while, there may be useful lessons to draw from it for the more recent AI safety work. Let's start with AI safety, then introduce the security approach, and finish with parallels.
AI safety: Boundaries in The Open Agency Model and the Acausal Society
In a recent LW post, The Open Agency Model, Eric Drexler expands on his previous CAIS work by introducing ‘open agencies’ as a model for AI safety. In contrast to the often proposed opaque or unitary agents, “agencies rely on generative models that produce diverse proposals, diverse critics that help select proposals, and diverse agents that implement proposed actions to accomplish tasks”, subject to ongoing review and revision.
In An Open Agency Architecture for Safe Transformative AI, Davidad expands on Eric Drexler’s model, suggesting that, instead of optimizing, this model would ‘depessimize’ by reaching a world that has existential safety. So rather than a fully-fledged AGI-enforced optimization scenario that implements all principles CEV would endorse, this would be a more modest approach that relies on the notion of important boundaries (including those of human and AI entities) being respected.
What could it mean to respect the boundaries of human and AI entities? In Acausal Normalcy, Andrew Critch also discusses the notion of respecting boundaries with respect to coordination in an acausal society. He thinks it’s possible that an acausal society generally holds values related to respecting boundaries. He defines ‘boundaries’ as the approximate causal separation of regions, either in physical spaces (such as spacetime) or abstract spaces (such as cyberspace). Respecting them intuitively means relying on the consent of the entity on the other side of the boundary when interacting with them: only using causal channels that were endogenously opened.
His examples of currently used boundaries include a person's skin that separates the inside of their body from the outside, a fence around a family's yard that separates their place from neighbors, a firewall that separates the LAN and its users from the rest of the internet, and a sustained disassociation of social groups that separates the two groups. In his Boundaries Sequence, Andrew Critch continues to formally define the notions of boundaries to generalize them to very different intelligences.
If the concept of respecting boundaries is in fact universally salient across intelligences, then it may be possible to help AIs discover and respect the boundaries humans find important (and potentially vice versa).
Computer security: Boundaries in the Object Capabilities Approach
Pursuing a similar idea, in Skim the Manual, Christine Peterson, Mark S. Miller, and I reframe the AI alignment problem as a secure cooperation problem across human and AI entities. Throughout history, we developed norms for human cooperation that emphasize the importance of respecting physical boundaries, for instance to not inflict violence, and cognitive boundaries, for instance to rely on informed consent. We also developed approaches for computational cooperation that emphasize the importance of respecting boundaries in cyberspace. For instance, in object-capabilities-oriented programming, individual computing entities are encapsulated to prevent interference with the contents of other objects.
The fact that ...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings