The Nonlinear Library

AF - Incentives from a causal perspective by Tom Everitt


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Incentives from a causal perspective, published by Tom Everitt on July 10, 2023 on The AI Alignment Forum.
Post 4 of Towards Causal Foundations of Safe AGI, preceded by Post 1: Introduction, Post 2: Causality, and Post 3: Agency.
By Tom Everitt, James Fox, Ryan Carey, Matt MacDermott, Sebastian Benthall, and Jon Richens, representing the Causal Incentives Working Group. Thanks also to Toby Shevlane and Aliya Ahmad.
“Show me the incentive, and I’ll show you the outcome” - Charlie Munger
Predicting behaviour is an important question when designing and deploying agentic AI systems. Incentives capture some key forces that shape agent behaviour, which don’t require us to fully understand the internal workings of a system.
This post shows how a causal model of an agent and its environment can reveal what the agent wants to know and what it wants to control, as well as how it will respond to commands and influence its environment. A complementary result shows that some incentives can only be inferred from a causal model, so a causal model of the agent’s environment is strictly necessary for a full incentive analysis.
Value of information
What information would an agent like to learn? Consider, for example, Mr Jones deciding whether to water his lawn, based on the weather report, and whether the newspaper arrived in the morning. Knowing the weather means that he can water more when it will be sunny than when it will be raining, which saves water and improves the greenness of the grass. The weather forecast therefore has information value for the sprinkler decision, and so does the weather itself, but the newspaper arrival does not.
We can quantify how useful observing the weather is for Mr Jones, by comparing his expected utility in a world in which he does observe the weather, to a world in which he doesn’t. (This measure only makes sense if we can assume that Mr Jones adapts appropriately to the different worlds, i.e. he needs to be agentic in this sense.)
The causal structure of the environment reveals which variables provide useful information. In particular, the d-separation criterion captures whether information can flow between variables in a causal graph when a subset of variables are observed. In single-decision graphs, value of information is possible when there is an information-carrying path from a variable to the agent’s utility node, when conditioning on the decision node and its parents (i.e. the “observed” nodes).
For example, in the above graph, there is an information-carrying path from forecast to weather to grass greenness, when conditioning on the sprinkler, forecast and newspaper. This means that the forecast can (and likely will) provide useful information about optimal watering. In contrast, there is no such path from the newspaper arrival. In that case, we call the information link from the newspaper to the sprinkler nonrequisite.
Understanding what information an agent wants to obtain is useful for several reasons. First, in e.g. fairness settings, the question of why a decision was made is often as important as what the decision was. Did gender determine a hiring decision? Value of information can help us understand what information the system is trying to glean from its available observations (though a formal understanding of proxies remains an important open question).
More philosophically, some researchers consider an agent’s cognitive boundary as the events that the agent cares to measure and influence. Events that lack value of information must fall outside the measuring part of this boundary.
Response Incentives
Related to the value of information are response incentives: what changes in the environment would a decision chosen by an optimal policy respond to? Changes are operationalised as post-policy interventions, i.e. as interventions that...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings