February 12, 2023

AF - The conceptual Dopplegänger problem by Tsvi Benson-Tilsen

4 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The conceptual Dopplegänger problem, published by Tsvi Benson-Tilsen on February 12, 2023 on The AI Alignment Forum.

[Metadata: crossposted from. First completed 9 October 2022.]

Suppose we want to observe the thoughts of a mind in order to detect whether it's making its way towards a plan to harm us, and ideally also to direct the mind so that it pursues specific aims. To this end, we might hope that the mind and its thinking are organized in a way we can come to understand in the way that we understand ourselves and our thinking.

We might hope that when the mind considers plans that involve something, e.g. plans that involve the coffee cup, it does so using a concept alike to our concept [[coffee cup]]. When the mind recognizes, predicts, imagines, simulates, manipulates, designs, combines things with, describes, studies, associates things with, summarizes, remembers, compares things with, deduces things about, makes hypotheses about, or is otherwise mentally involved with the coffee cup, maybe it always does so in a way that is fully comprehendable in fixed terms that are similar to the terms in which we understand ourselves when we do those activities. Maybe the structure involved in psychic events in the mind reliably falls into basins of attraction that indicate unambiguously to us, as we observe these events, which nexi of reference that structure constitutes.

Maybe the X-and-only-X problem is solved by ensuring that the mind's thoughts are in a language made of these concepts; when the mind plans to "fetch the coffee", it somehow means only fetching the coffee, in the "natural" sense of [[fetch]] and [[the coffee]].

One obstacle to this rosy picture is conceptual Dopplegängers. A conceptual Dopplegänger of some concept Z, is a concept Z' that serves some overlapping functions in the mind as Z serves, but is psychically distinct from Z. Here saying that Z' is psychically distinct from Z is ambiguous, but means something like: Z' is not transparently closely related to Z, or is mechanistically / physically separate from Z, or is referred to in a set of contexts that's systematically segregrated from the contexts in which Z is referred to, or is not explicitly described or treated as being the same as or similar to or analogous to Z. A Dopplegänger concept Z' enables a mind to think about what Z is about, at least in some respects, without psychically using Z. This makes it hard to be sure that the mind is not thinking about what Z is about; even if the mind is not using Z, it might be thinking about what Z is about by using some Z'.

Maybe Dopplegängers of Z can be psychically located by doing something like looking for mental stuff that has high mutual logical information with Z. This might work to identify blatant deception: if the mind maintains a puppet show of fake thoughts using Z and has its real thoughts using a Z' that's psychically isomorphic to Z, then Z' will be obviously related to Z. But, Dopplegängers don't have to be so obvious. Mental stuff that constitutes skill with manipulating what Z is about, can be, compared to Z, more or less:

partial

implicit

diffuse (diffused throughout other skills and knowledge)

encrypted

externalized

transiently reconstructed out of precursors when needed

structurally deep (and therefore alien to someone who thinks in terms of Z)

Baldwinized to specific purposes

and can be

expressed in a different language or constituted by differently-factored concepts

referred to in a set of contexts that's systematically segregrated from the contexts in which Z is referred to.

All of these features make it harder to see that Z' is in some respects a Doppelgänger of Z. In other words, to the extent these features (and probably others) characterize mental stuff in the mind, the mind is liable to be thinking about coffee cups even wh...

...more