The Nonlinear Library

AF - Why are counterfactuals elusive? by Martín Soto


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why are counterfactuals elusive?, published by Martín Soto on March 3, 2023 on The AI Alignment Forum.
Produced as part of SERI MATS 3.0. Thanks to Vivek Hebbar and Paul Colognese for discussion.
TL;DR (spoiler):
Behind the problem of human counterfactuals creeps the problem of understanding abstraction / ontology identification.
A nice theory of counterfactuals would be useful for many things, including low-impact measures for corrigible AI:
a flooded workshop changes a lot of things that don't have to change as a consequence of the cauldron being filled at all, averaged over a lot of ways of filling the cauldron. [the natural operationalization of this averaging requires counterfactuals]
So whence the difficulty of obtaining one?
Well, we do have at least one well-defined class of counterfactuals: "just take a chunk of atoms, replace it by another, and continue running the laws of physics". This is a discontinuity in the laws of physics that would never take place in the real world, but we don't care about that: we can just continue running the mathematical laws of physics from that state, as if we were dealing with a Game of Life board.
But this doesn't correspond to our intuitive notion of counterfactuals. When humans think about counterfactuals, we are basically changing the state of a latent variable inside our heads, and rerunning a computation. For example, maybe we change the state of the "yesterday's weather" variable from "sunny" to "rainy", and rerun the computation "how did the picnic go?".
The problem with this is our latent variables don't neatly correspond to parts of physical reality. Sometimes they don't even correspond to any parts of physical reality at all! And so, some (in fact, most) of the variable changes we offhandedly perform, don't univocally correspond to physical counterfactuals natively expressed in our laws of physics.
If you just replace a three-dimensional cube of atmosphere to include a rainy cloud, people will notice a cloud appeared out of nowhere. So as a necessary consequence, people will be freaked out by this artificial fact, which is not at all what you had in mind for your counterfactual. Sometimes you'll be able to just add the cloud when no one is looking. But most times, and especially when dealing with messier human concepts, the physical counterfactual will be under-determined, or even none of them will correspond to what you had in mind, using your neatly compartmentalized variables.
This is not to say human counterfactuals are meaningless: they are a way of taking advantage of regularities discovered in the world. When a physicist says "if I had put system A there, it would have evolved into system B", they just mean said causality relation has been demonstrated by their experiments, or is predicted by their gears-level well-tested theories (modulo the philosophical problem of induction, as always). Similarly, a counterfactual might help you notice or remember rainy days are no good for picnics, which is useful for future action.
But it becomes clear that such natural language counterfactuals depend on the mind's native concepts. And so, instead of a neat and objective mathematical definition that makes sense of these counterfactuals, we should expect ontology identification (matching our concepts with physical reality) to be the hard part to operationalizing them.
More concretely, suppose we had a solution to ontology identification: a probability distribution P(Mindstate|Worldstate). By having additionally a prior over worldstates (or mindstates), we can obtain the dual distribution P(Worldstate|Mindstate). And given that, we can just use the do() operator in a mindstate to natively implement the counterfactual, and then condition on the new mindstate to find which probability distribution over reality it correspond...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings