
Sign up to save your podcasts
Or


Audio note: this article contains 88 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
This post presents a simple toy coherence theorem, and then uses it to address various common confusions about coherence arguments.
Setting
Deterministic MDP. That means at each time _t_ there's a state _S[t]_[1], the agent/policy takes an action _A[t]_ (which can depend on both time _t_ and current state _S[t]_), and then the next state _S[t+1]_ is fully determined by _S[t]_ and _A[t]_. The current state and current action are sufficient to tell us the next state.
We will think about values over the state at some final time _T_. Note that often in MDPs there is an incremental reward each timestep in addition to a final reward at the end; in our setting there is zero [...]
---
Outline:
(00:25) Setting
(01:30) Theorem
(01:48) Proof
(07:51) Anti-Takeaways: Things Which Dont Generalize
(07:57) Determinism
(08:48) Takeaways: Things Which (I Expect) Do Generalize
(08:55) Coherence Is Nontrivial For Optimization At A Distance
(11:18) We Didnt Need Trades, Adversaries, Money-Pumping, Etc
(12:06) Approximation
(12:54) Coherence Is About Revealed Preferences
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrong
Audio note: this article contains 88 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
This post presents a simple toy coherence theorem, and then uses it to address various common confusions about coherence arguments.
Setting
Deterministic MDP. That means at each time _t_ there's a state _S[t]_[1], the agent/policy takes an action _A[t]_ (which can depend on both time _t_ and current state _S[t]_), and then the next state _S[t+1]_ is fully determined by _S[t]_ and _A[t]_. The current state and current action are sufficient to tell us the next state.
We will think about values over the state at some final time _T_. Note that often in MDPs there is an incremental reward each timestep in addition to a final reward at the end; in our setting there is zero [...]
---
Outline:
(00:25) Setting
(01:30) Theorem
(01:48) Proof
(07:51) Anti-Takeaways: Things Which Dont Generalize
(07:57) Determinism
(08:48) Takeaways: Things Which (I Expect) Do Generalize
(08:55) Coherence Is Nontrivial For Optimization At A Distance
(11:18) We Didnt Need Trades, Adversaries, Money-Pumping, Etc
(12:06) Approximation
(12:54) Coherence Is About Revealed Preferences
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,843 Listeners

130 Listeners

7,214 Listeners

531 Listeners

16,223 Listeners

4 Listeners

14 Listeners

2 Listeners