
Sign up to save your podcasts
Or


Anthropic are now actively using the approach to alignment often called “Alignment Pretraining” or “Safety Pretraining” — using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this out, ound it works well, and are now using it.
I’m absolutely delighted. I’ve been advocating this approach on LessWrong and the Alignment Forum for several years:
I’ve been very excited about this alignment technique for a couple of years, ever since I read the seminal paper demonstrating that it was extremely effective, Pretraining Language Models with Human Preferences (Korbak et al., ’23). This was later followed up by Safety Pretraining: Toward the Next Generation [...]
---
First published:
Source:
Linkpost URL:
https://www.anthropic.com/research/teaching-claude-why
---
Narrated by TYPE III AUDIO.
By LessWrongAnthropic are now actively using the approach to alignment often called “Alignment Pretraining” or “Safety Pretraining” — using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this out, ound it works well, and are now using it.
I’m absolutely delighted. I’ve been advocating this approach on LessWrong and the Alignment Forum for several years:
I’ve been very excited about this alignment technique for a couple of years, ever since I read the seminal paper demonstrating that it was extremely effective, Pretraining Language Models with Human Preferences (Korbak et al., ’23). This was later followed up by Safety Pretraining: Toward the Next Generation [...]
---
First published:
Source:
Linkpost URL:
https://www.anthropic.com/research/teaching-claude-why
---
Narrated by TYPE III AUDIO.

112,330 Listeners

130 Listeners

7,247 Listeners

563 Listeners

16,328 Listeners

4 Listeners

14 Listeners

2 Listeners