LessWrong (30+ Karma)

“Agents Can Get Stuck in Self-distrusting Equilibria” by Ashe Vazquez Nuñez


Listen Later

Or: Identities as Schelling Fences for Embedded Agents

This post was written as part of research done at MATS 9.0 under the mentorship of Richard Ngo. He contributed significantly to the ideas discussed within.

Introduction

This post questions the sanctity of the "agent" and discusses how Temporal Instances (TIs) of an agent can enter conflict due to distrust. These dynamics are describable mathematically as an intrapersonal cooperative game. I define a time-version of Nash equilibria and show an example of a self-punishing pattern between TIs that is nevertheless stable.

This leads us to ask what conditions allow disparate parts of an agent to cooperate harmoniously. I conjecture that agents showing a degree of consistency in their actions over time can be seen as adhering to an identity that replaces Common Knowledge of Rationality (CKR) between the game's players. In subscribing to a common identity, TIs declare trust in each other akin to that which an updateless[1] agent would embody.

I next deliberate on the shape that a formal statement and proof of this conjecture is likely to take. This will involve a translation of universal type spaces to intrapersonal games for a complete treatment of CKR. I also [...]

---

Outline:

(00:26) Introduction

(01:42) The incoherent self

(03:22) What does coherence look like?

(10:15) A mathematical framework for (lack of) self-trust

(13:56) Further work and conjectures

(15:26) Better notions of belief

(16:24) Robust equilibria and updatelessness

(19:08) What does this have to do with AI?

The original text contained 10 footnotes which were omitted from this narration.

---

First published:

March 24th, 2026

Source:

https://www.lesswrong.com/posts/MGoCFnCRYufwTyAD5/agents-can-get-stuck-in-self-distrusting-equilibria

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,326 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,242 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,321 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners