
Sign up to save your podcasts
Or
Goal of This Post
I have never seen anyone give a satisfying intuitive explanation of what corrigibility (in roughly Eliezer's sense of the word) is. There's lists of desiderata, but they sound like scattered wishlists which don’t obviously point to a unified underlying concept at all. There's also Eliezer's extremely meta pointer:
We can imagine, e.g., the AI imagining itself building a sub-AI while being prone to various sorts of errors, asking how it (the AI) would want the sub-AI to behave in those cases, and learning heuristics that would generalize well to how we would want the AI to behave if it suddenly gained a lot of capability or was considering deceiving its programmers and so on.
… and that's basically it.[1]
In this post, we’re going to explain a reasonably-unified concept which seems like a decent match to “corrigibility” in Eliezer's sense.
Tools
Starting point: we think [...]
---
Outline:
(00:05) Goal of This Post
(01:03) Tools
(02:40) Respecting Modularity
(04:49) Visibility and Correctability
(06:00) Let's Go Through A List Of Desiderata
(11:53) What Would It Look Like To Use A Powerful Corrigible AGI?
(14:11) From Cognition to Real Patterns?
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
Goal of This Post
I have never seen anyone give a satisfying intuitive explanation of what corrigibility (in roughly Eliezer's sense of the word) is. There's lists of desiderata, but they sound like scattered wishlists which don’t obviously point to a unified underlying concept at all. There's also Eliezer's extremely meta pointer:
We can imagine, e.g., the AI imagining itself building a sub-AI while being prone to various sorts of errors, asking how it (the AI) would want the sub-AI to behave in those cases, and learning heuristics that would generalize well to how we would want the AI to behave if it suddenly gained a lot of capability or was considering deceiving its programmers and so on.
… and that's basically it.[1]
In this post, we’re going to explain a reasonably-unified concept which seems like a decent match to “corrigibility” in Eliezer's sense.
Tools
Starting point: we think [...]
---
Outline:
(00:05) Goal of This Post
(01:03) Tools
(02:40) Respecting Modularity
(04:49) Visibility and Correctability
(06:00) Let's Go Through A List Of Desiderata
(11:53) What Would It Look Like To Use A Powerful Corrigible AGI?
(14:11) From Cognition to Real Patterns?
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,434 Listeners
2,388 Listeners
7,906 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners