
Sign up to save your podcasts
Or
Here's a fairly concrete AGI safety proposal:
Default AGI design: Let's suppose we are starting with a pretrained LLM 'base model' and then we are going to do a ton of additional RL ('agency training') to turn it into a general-purpose autonomous agent. So, during training it'll do lots of CoT of 'reasoning' (think like how o1 does it) and then it'll output some text that the user or some external interface sees (e.g. typing into a browser, or a chat window), and then maybe it'll get some external input (the user's reply, etc.) and then the process repeats many times, and then some process evaluates overall performance (by looking at the entire trajectory as well as the final result) and doles out reinforcement.
Proposal part 1: Shoggoth/Face Distinction: Instead of having one model undergo agency training, we have two copies of the base model work [...]
---
Outline:
(13:47) Killing Canaries
(16:40) Basins of Attraction
---
First published:
Source:
Narrated by TYPE III AUDIO.
Here's a fairly concrete AGI safety proposal:
Default AGI design: Let's suppose we are starting with a pretrained LLM 'base model' and then we are going to do a ton of additional RL ('agency training') to turn it into a general-purpose autonomous agent. So, during training it'll do lots of CoT of 'reasoning' (think like how o1 does it) and then it'll output some text that the user or some external interface sees (e.g. typing into a browser, or a chat window), and then maybe it'll get some external input (the user's reply, etc.) and then the process repeats many times, and then some process evaluates overall performance (by looking at the entire trajectory as well as the final result) and doles out reinforcement.
Proposal part 1: Shoggoth/Face Distinction: Instead of having one model undergo agency training, we have two copies of the base model work [...]
---
Outline:
(13:47) Killing Canaries
(16:40) Basins of Attraction
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,362 Listeners
2,380 Listeners
7,924 Listeners
4,131 Listeners
87 Listeners
1,447 Listeners
8,922 Listeners
88 Listeners
379 Listeners
5,425 Listeners
15,206 Listeners
475 Listeners
121 Listeners
77 Listeners
455 Listeners