
Sign up to save your podcasts
Or
[This is a mildly-edited version of a google doc I wrote at OpenAI in July 2022. I had intended to get it published in some form, but never got around to it for various reasons. I have now received approval to put it up as a blog post. The main thing of interest here is the distinctions I make; particularly the concept of self-location. Also the examples in the appendix that illustrate the distinctions. I lump all three concepts (self-knowledge, self-location, introspection) together under the banner of Self-Awareness, but since that's a spicy term which may have other connotations, these days I'd probably use the more neutral term Situational Awareness.]
Summary.---
Outline:
(02:03) Outline:
(02:57) Self Knowledge
(03:00) What it means
(03:42) How to test for it
(04:23) Introspection
(04:27) What it means
(05:26) How to test for it
(09:53) Self-Location
(09:56) What it means
(12:08) How to test for it
(12:12) “Natural” Method: Look to see whether the model's performance/loss has a discontinuous improvement after it reads descriptions of its reward function and/or training environment.
(13:59) “Naive” Method: Just ask the model who it is.
(15:44) “Sophisticated” method: Experiment designed to directly test self-location ability.
(16:57) Importance
(17:01) Self-awareness → Consciousness → Moral Patienthood
(18:04) Self-awareness → Strategic Awareness and Agency → APS-AI
(19:29) Self-awareness → Situational Awareness → Alignment Failures
(20:51) Recommendations
(21:02) Appendix
(21:15) Self-knowledge + introspection but no self-location:
(22:50) Self-knowledge + self-location but no introspection:
(24:14) Introspection + self-location but little self-knowledge:
---
First published:
Source:
Narrated by TYPE III AUDIO.
[This is a mildly-edited version of a google doc I wrote at OpenAI in July 2022. I had intended to get it published in some form, but never got around to it for various reasons. I have now received approval to put it up as a blog post. The main thing of interest here is the distinctions I make; particularly the concept of self-location. Also the examples in the appendix that illustrate the distinctions. I lump all three concepts (self-knowledge, self-location, introspection) together under the banner of Self-Awareness, but since that's a spicy term which may have other connotations, these days I'd probably use the more neutral term Situational Awareness.]
Summary.---
Outline:
(02:03) Outline:
(02:57) Self Knowledge
(03:00) What it means
(03:42) How to test for it
(04:23) Introspection
(04:27) What it means
(05:26) How to test for it
(09:53) Self-Location
(09:56) What it means
(12:08) How to test for it
(12:12) “Natural” Method: Look to see whether the model's performance/loss has a discontinuous improvement after it reads descriptions of its reward function and/or training environment.
(13:59) “Naive” Method: Just ask the model who it is.
(15:44) “Sophisticated” method: Experiment designed to directly test self-location ability.
(16:57) Importance
(17:01) Self-awareness → Consciousness → Moral Patienthood
(18:04) Self-awareness → Strategic Awareness and Agency → APS-AI
(19:29) Self-awareness → Situational Awareness → Alignment Failures
(20:51) Recommendations
(21:02) Appendix
(21:15) Self-knowledge + introspection but no self-location:
(22:50) Self-knowledge + self-location but no introspection:
(24:14) Introspection + self-location but little self-knowledge:
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,462 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,438 Listeners
15,220 Listeners
475 Listeners
121 Listeners
75 Listeners
461 Listeners