
Sign up to save your podcasts
Or


[This is a mildly-edited version of a google doc I wrote at OpenAI in July 2022. I had intended to get it published in some form, but never got around to it for various reasons. I have now received approval to put it up as a blog post. The main thing of interest here is the distinctions I make; particularly the concept of self-location. Also the examples in the appendix that illustrate the distinctions. I lump all three concepts (self-knowledge, self-location, introspection) together under the banner of Self-Awareness, but since that's a spicy term which may have other connotations, these days I'd probably use the more neutral term Situational Awareness.]
Summary.---
Outline:
(02:03) Outline:
(02:57) Self Knowledge
(03:00) What it means
(03:42) How to test for it
(04:23) Introspection
(04:27) What it means
(05:26) How to test for it
(09:53) Self-Location
(09:56) What it means
(12:08) How to test for it
(12:12) “Natural” Method: Look to see whether the model's performance/loss has a discontinuous improvement after it reads descriptions of its reward function and/or training environment.
(13:59) “Naive” Method: Just ask the model who it is.
(15:44) “Sophisticated” method: Experiment designed to directly test self-location ability.
(16:57) Importance
(17:01) Self-awareness → Consciousness → Moral Patienthood
(18:04) Self-awareness → Strategic Awareness and Agency → APS-AI
(19:29) Self-awareness → Situational Awareness → Alignment Failures
(20:51) Recommendations
(21:02) Appendix
(21:15) Self-knowledge + introspection but no self-location:
(22:50) Self-knowledge + self-location but no introspection:
(24:14) Introspection + self-location but little self-knowledge:
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrong[This is a mildly-edited version of a google doc I wrote at OpenAI in July 2022. I had intended to get it published in some form, but never got around to it for various reasons. I have now received approval to put it up as a blog post. The main thing of interest here is the distinctions I make; particularly the concept of self-location. Also the examples in the appendix that illustrate the distinctions. I lump all three concepts (self-knowledge, self-location, introspection) together under the banner of Self-Awareness, but since that's a spicy term which may have other connotations, these days I'd probably use the more neutral term Situational Awareness.]
Summary.---
Outline:
(02:03) Outline:
(02:57) Self Knowledge
(03:00) What it means
(03:42) How to test for it
(04:23) Introspection
(04:27) What it means
(05:26) How to test for it
(09:53) Self-Location
(09:56) What it means
(12:08) How to test for it
(12:12) “Natural” Method: Look to see whether the model's performance/loss has a discontinuous improvement after it reads descriptions of its reward function and/or training environment.
(13:59) “Naive” Method: Just ask the model who it is.
(15:44) “Sophisticated” method: Experiment designed to directly test self-location ability.
(16:57) Importance
(17:01) Self-awareness → Consciousness → Moral Patienthood
(18:04) Self-awareness → Strategic Awareness and Agency → APS-AI
(19:29) Self-awareness → Situational Awareness → Alignment Failures
(20:51) Recommendations
(21:02) Appendix
(21:15) Self-knowledge + introspection but no self-location:
(22:50) Self-knowledge + self-location but no introspection:
(24:14) Introspection + self-location but little self-knowledge:
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,063 Listeners

130 Listeners

7,230 Listeners

577 Listeners

16,056 Listeners

4 Listeners

14 Listeners

2 Listeners