May 10, 2024

“Why Care About Natural Latents?” by johnswentworth, David Lorell

9 minutes

Suppose Alice and Bob are two Bayesian agents in the same environment. They both basically understand how their environment works, so they generally agree on predictions about any specific directly-observable thing in the world - e.g. whenever they try to operationalize a bet, they find that their odds are roughly the same. However, their two world models might have totally different internal structure, different “latent” structures which Alice and Bob model as generating the observable world around them. As a simple toy example: maybe Alice models a bunch of numbers as having been generated by independent rolls of the same biased die, and Bob models the same numbers using some big complicated neural net.

Now suppose Alice goes poking around inside of her world model, and somewhere in there she finds a latent variable _Lambda_A_ with two properties (the Natural Latent properties):

_Lambda_A_ approximately mediates between two different [...]

---

Outline:

(04:23) So What Could We Do With That?

(04:35) Interpretability

(06:29) Value Learning and The Pointers Problem

(08:03) Where We’re Currently Headed With This

The original text contained 1 footnote which was omitted from this narration.

---

First published:

May 9th, 2024

Source:

https://www.lesswrong.com/posts/RTiuLzusJWyepFpbN/why-care-about-natural-latents

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong

May 10, 2024

“Why Care About Natural Latents?” by johnswentworth, David Lorell

9 minutes

Now suppose Alice goes poking around inside of her world model, and somewhere in there she finds a latent variable _Lambda_A_ with two properties (the Natural Latent properties):