Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations), published by Thane Ruthenis on December 22, 2023 on The AI Alignment Forum.
Epistemic status: I'm currently unsure whether that's a fake framework, a probably-wrong mechanistic model, or a legitimate insight into the fundamental nature of agency. Regardless, viewing things from this angle has been helpful for me.
In addition, the ambitious implications of this view is one of the reasons I'm fairly optimistic about arriving at a robust solution to alignment via agent-foundations research in a timely manner. (My semi-arbitrary deadline is 2030, and I expect to arrive at intermediate solid results by EOY 2025.)
Input Side: Observations
Consider what happens when we draw inferences based on observations.
Photons hit our eyes. Our brains draw an image aggregating the information each photon gave us. We interpret this image, decomposing it into objects, and inferring which latent-variable object is responsible for generating which part of the image. Then we wonder further: what process generated each of these objects? For example, if one of the "objects" is a news article, what is it talking about? Who wrote it? What events is it trying to capture? What set these events into motion? And so on.
In diagram format, we're doing something like this:
We take in observations, infer what latent variables generated them, then infer what generated those variables, and so on. We go backwards: from effects to causes, iteratively. The Cartesian boundary of our input can be viewed as a "mirror" of a sort, reflecting the Past.
It's a bit messier in practice, of course. There are shortcuts, ways to map immediate observations to far-off states. But the general idea mostly checks out - especially given that these "shortcuts" probably still implicitly route through all the intermediate variables, just without explicitly computing them. (You can map a news article to the events it's describing without explicitly modeling the intermediary steps of witnesses, journalists, editing, and publishing.
Output Side: Actions
Consider what happens when we're planning to achieve some goal, in a consequentialist-like manner.
We envision the target state. What we want to achieve, how the world would look like. Then we ask ourselves: what would cause this? What forces could influence the outcome to align with our desires? And then: how do we control these forces? What actions would we need to take in order to make the network of causes and effects steer the world towards our desires?
In diagram format, we're doing something like this:
We start from our goals, infer what latent variables control their state in the real world, then infer what controls those latent variables, and so on. We go backwards: from effects to causes, iteratively, until getting to our own actions. The Cartesian boundary of our output can be viewed as a "mirror" of a sort, reflecting the Future.
It's a bit messier in practice, of course. There are shortcuts, ways to map far-off goals to immediate actions. But the general idea mostly checks out - especially given that these heuristics probably still implicitly route through all the intermediate variables, just without explicitly computing them. ("Acquire resources" is a good heuristical starting point for basically any plan.
And indeed, that side of my formulation isn't novel! From this post by Scott Garrabrant:
Time is also crucial for thinking about agency. My best short-phrase definition of agency is that agency is time travel. An agent is a mechanism through which the future is able to affect the past. An agent models the future consequences of its actions, and chooses actions on the basis of those consequences. In that sense,
the consequence
causes
the action, in spite of the fact that the ac...