
Sign up to save your podcasts
Or


TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role.
Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is. However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should [...]
---
Outline:
(01:19) Why The Alignment Problem is Hard (In My Opinion)
(11:41) A Bitter Lesson-Motivated Approach to Alignment
(18:34) Adding Minimal Necessary Complexity
(30:28) Could This Work?
(36:24) How Expensive Would Doing This Be?
(41:53) What Next if This Works?
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongTL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role.
Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is. However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should [...]
---
Outline:
(01:19) Why The Alignment Problem is Hard (In My Opinion)
(11:41) A Bitter Lesson-Motivated Approach to Alignment
(18:34) Adding Minimal Necessary Complexity
(30:28) Could This Work?
(36:24) How Expensive Would Doing This Be?
(41:53) What Next if This Works?
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners