
Sign up to save your podcasts
Or


Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.
Topics we discuss, and timestamps:
- 00:00:44 - What is concept extrapolation
- 00:15:25 - When is concept extrapolation possible
- 00:30:44 - A toy formalism
- 00:37:25 - Uniqueness of extrapolations
- 00:48:34 - Unity of concept extrapolation methods
- 00:53:25 - Concept extrapolation and corrigibility
- 00:59:51 - Is concept extrapolation possible?
- 01:37:05 - Misunderstandings of Stuart's approach
- 01:44:13 - Following Stuart's work
The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html
Stuart's startup, Aligned AI: aligned-ai.com
Research we discuss:
- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA
- The HappyFaces benchmark: github.com/alignedai/HappyFaces
- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111
By Daniel Filan4.4
88 ratings
Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.
Topics we discuss, and timestamps:
- 00:00:44 - What is concept extrapolation
- 00:15:25 - When is concept extrapolation possible
- 00:30:44 - A toy formalism
- 00:37:25 - Uniqueness of extrapolations
- 00:48:34 - Unity of concept extrapolation methods
- 00:53:25 - Concept extrapolation and corrigibility
- 00:59:51 - Is concept extrapolation possible?
- 01:37:05 - Misunderstandings of Stuart's approach
- 01:44:13 - Following Stuart's work
The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html
Stuart's startup, Aligned AI: aligned-ai.com
Research we discuss:
- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA
- The HappyFaces benchmark: github.com/alignedai/HappyFaces
- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111

26,371 Listeners

2,426 Listeners

1,083 Listeners

107 Listeners

112,356 Listeners

210 Listeners

9,793 Listeners

89 Listeners

489 Listeners

5,473 Listeners

132 Listeners

16,106 Listeners

97 Listeners

209 Listeners

133 Listeners