
Sign up to save your podcasts
Or


Goal of This Post
I have never seen anyone give a satisfying intuitive explanation of what corrigibility (in roughly Eliezer's sense of the word) is. There's lists of desiderata, but they sound like scattered wishlists which don’t obviously point to a unified underlying concept at all. There's also Eliezer's extremely meta pointer:
We can imagine, e.g., the AI imagining itself building a sub-AI while being prone to various sorts of errors, asking how it (the AI) would want the sub-AI to behave in those cases, and learning heuristics that would generalize well to how we would want the AI to behave if it suddenly gained a lot of capability or was considering deceiving its programmers and so on.
… and that's basically it.[1]
In this post, we’re going to explain a reasonably-unified concept which seems like a decent match to “corrigibility” in Eliezer's sense.
Tools
Starting point: we think [...]
---
Outline:
(00:05) Goal of This Post
(01:03) Tools
(02:40) Respecting Modularity
(04:49) Visibility and Correctability
(06:00) Let's Go Through A List Of Desiderata
(11:53) What Would It Look Like To Use A Powerful Corrigible AGI?
(14:11) From Cognition to Real Patterns?
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongGoal of This Post
I have never seen anyone give a satisfying intuitive explanation of what corrigibility (in roughly Eliezer's sense of the word) is. There's lists of desiderata, but they sound like scattered wishlists which don’t obviously point to a unified underlying concept at all. There's also Eliezer's extremely meta pointer:
We can imagine, e.g., the AI imagining itself building a sub-AI while being prone to various sorts of errors, asking how it (the AI) would want the sub-AI to behave in those cases, and learning heuristics that would generalize well to how we would want the AI to behave if it suddenly gained a lot of capability or was considering deceiving its programmers and so on.
… and that's basically it.[1]
In this post, we’re going to explain a reasonably-unified concept which seems like a decent match to “corrigibility” in Eliezer's sense.
Tools
Starting point: we think [...]
---
Outline:
(00:05) Goal of This Post
(01:03) Tools
(02:40) Respecting Modularity
(04:49) Visibility and Correctability
(06:00) Let's Go Through A List Of Desiderata
(11:53) What Would It Look Like To Use A Powerful Corrigible AGI?
(14:11) From Cognition to Real Patterns?
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,842 Listeners

130 Listeners

7,215 Listeners

531 Listeners

16,221 Listeners

4 Listeners

14 Listeners

2 Listeners