
Sign up to save your podcasts
Or


There's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn’t have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
Tabooing “Race Dynamics”
I’ve heard a lot of people say that this “is bad for race dynamics”. I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing “race dynamics”, a common narrative behind these words is
As companies release better & better models, this incentivizes other companies to pursue more capable models at the expense of safety. Eventually, one company goes too far, produces unaligned AGI, and we all die”.
---
Outline:
(00:28) Tabooing “Race Dynamics”
(01:28) Did releasing Claude-3 cause other AI labs to invest less in evals/redteaming models before deployment?
(02:43) Did releasing Claude-3 divert resources away from alignment research and into capabilities research?
(03:48) Releasing Very SOTA Models
(04:18) Anthropic at the Frontier is Good?
(06:18) What I Currently Think
(06:59) Conclusion
(08:34) Appendix:
(08:38) Capabilities Leakage
(12:37) Is Increasing Capabilities Bad?
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThere's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn’t have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
Tabooing “Race Dynamics”
I’ve heard a lot of people say that this “is bad for race dynamics”. I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing “race dynamics”, a common narrative behind these words is
As companies release better & better models, this incentivizes other companies to pursue more capable models at the expense of safety. Eventually, one company goes too far, produces unaligned AGI, and we all die”.
---
Outline:
(00:28) Tabooing “Race Dynamics”
(01:28) Did releasing Claude-3 cause other AI labs to invest less in evals/redteaming models before deployment?
(02:43) Did releasing Claude-3 divert resources away from alignment research and into capabilities research?
(03:48) Releasing Very SOTA Models
(04:18) Anthropic at the Frontier is Good?
(06:18) What I Currently Think
(06:59) Conclusion
(08:34) Appendix:
(08:38) Capabilities Leakage
(12:37) Is Increasing Capabilities Bad?
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,041 Listeners

130 Listeners

7,230 Listeners

531 Listeners

16,229 Listeners

4 Listeners

14 Listeners

2 Listeners