
Sign up to save your podcasts
Or
This is a link-post for an exciting paper I recently read: Safety Pretraining: Toward the Next Generation of Safe AI by Pratyush Maini, Sachin Goyal, et al.
For a couple of years I (and others) have been proposing an approach to alignment: what the authors of this recent paper name "safety pretraining". In a nutshell: that it's best to apply your alignment training as part of the standard pretraining process to produce a base model that is already aligned — simply pretrain it on a lot of examples of aligned behavior.
I've regarded this approach as a major advance ever since I read the seminal 2023 paper on the topic: Pretraining Language Models with Human Preferences by Tomasz Korbak et al., and I'm absolutely delighted to finally see someone else publish another paper on this approach — I'm only sad it has taken so long.
I highly encourage [...]
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
This is a link-post for an exciting paper I recently read: Safety Pretraining: Toward the Next Generation of Safe AI by Pratyush Maini, Sachin Goyal, et al.
For a couple of years I (and others) have been proposing an approach to alignment: what the authors of this recent paper name "safety pretraining". In a nutshell: that it's best to apply your alignment training as part of the standard pretraining process to produce a base model that is already aligned — simply pretrain it on a lot of examples of aligned behavior.
I've regarded this approach as a major advance ever since I read the seminal 2023 paper on the topic: Pretraining Language Models with Human Preferences by Tomasz Korbak et al., and I'm absolutely delighted to finally see someone else publish another paper on this approach — I'm only sad it has taken so long.
I highly encourage [...]
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
26,469 Listeners
2,395 Listeners
7,953 Listeners
4,145 Listeners
89 Listeners
1,472 Listeners
9,207 Listeners
88 Listeners
426 Listeners
5,462 Listeners
15,335 Listeners
482 Listeners
121 Listeners
75 Listeners
461 Listeners