Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Control, published by Tsvi Benson-Tilsen on February 5, 2023 on The AI Alignment Forum.
[Metadata: crossposted from. First completed 3 July 2022.]
I don't know how to define control or even point at it except as a word-cloud, so it's probably wanting to be refactored. The point of talking about control is to lay part of the groundwork for understanding what determines what directions a mind ends up pushing the world in. Control is something like what's happening when values or drives are making themselves felt as values or drives. ("Influence" = "in-flow" might be a better term than "control".)
Previous: Structure, creativity, and novelty
Definitions of control
Control is when an element makes another element do something. This relies on elements "doing stuff".
Control is when an element {counterfactually, evidentially, causally, logically...} determines {the behavior, the outcome of the behavior} of an assembly of elements.
Control is when an element modifies the state of an element. This relies on elements having a state. Alternatively, control is when an element replaces an element with a similar element.
Control is when an element selects something according to a criterion.
These definitions aren't satisfying in part because they rely on the pre-theoretic ideas of "makes", "determines", "modifies", "selects". Those ideas could be defined precisely in terms of causality, but doing that would narrow their scope and elide some of the sense of "control". To say, pre-theoretically, "My desire for ice cream is controlling where I'm walking.", is sometimes to say "The explanation for why I'm walking along such-and-such a path, is that I'm selecting actions based on whether they'll get me ice cream, and that such-and-such a path leads to ice cream.", and explanation in general doesn't have to be about causality. Control is whatever lies behind the explanations given in answer to questions like "What's controlling X?" and "How does Y control Z?" and "How can I control W?".
Another way the above definitions are unsatisfactory is that they aren't specific enough; some of them would say that if I receive a message and then update my beliefs according to an epistemic rule, that message controls me. That might be right, but it's a little counterintuitive to me.
There's a tension between describing the dynamics of a mind--how the parts interact over time--vs. describing the outcomes of a mind, which is more easily grasped with gemini modeling of "desires". (I.e. by having your own copy of the "desire" and your own machinery for playing out the same meaning of the "desire" analogously to the original "desire" in the original mind.) I'm focusing on dynamical concepts because they seem more agnostic as discussed above, but it might be promising to instead start with presumptively unified agency and then distort / modify / differentiate / deform / vary the [agency used to gemini model a desire] to allow for modeling less-presumptively-coherent control. (For discussion of the general form of this "whole->wholes" approach, distinct from the "parts->wholes" approach, see Non-directed conceptual founding.) Another definition of control in that vein, a variation on a formula from Sam Eisenstat:
Control is an R-stable relationship between an R-stable element and R-unstable prior/posterior elements (which therefore play overlapping roles). "R-stable" means stable under ontological Revolutions. That is, we have C(X,Y) and C(X,Z), where X and C are somehow the same before and after an ontological revolution, and Y and Z aren't the same.
Control vs. values
I'm talking about control rather than "values" because I don't want to assume:
that there are terminal values,
that there's a clear distinction between terminal values and non-terminal values,
that there are values stable across time and m...