
Sign up to save your podcasts
Or
There's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn’t have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
Tabooing “Race Dynamics”
I’ve heard a lot of people say that this “is bad for race dynamics”. I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing “race dynamics”, a common narrative behind these words is
As companies release better & better models, this incentivizes other companies to pursue more capable models at the expense of safety. Eventually, one company goes too far, produces unaligned AGI, and we all die”.
---
Outline:
(00:28) Tabooing “Race Dynamics”
(01:28) Did releasing Claude-3 cause other AI labs to invest less in evals/redteaming models before deployment?
(02:43) Did releasing Claude-3 divert resources away from alignment research and into capabilities research?
(03:48) Releasing Very SOTA Models
(04:18) Anthropic at the Frontier is Good?
(06:18) What I Currently Think
(06:59) Conclusion
(08:34) Appendix:
(08:38) Capabilities Leakage
(12:37) Is Increasing Capabilities Bad?
---
First published:
Source:
Narrated by TYPE III AUDIO.
There's been a lot of discussion among safety-concerned people about whether it was bad for Anthropic to release Claude-3. I felt like I didn’t have a great picture of all the considerations here, and I felt that people were conflating many different types of arguments for why it might be bad. So I decided to try to write down an at-least-slightly-self-contained description of my overall views and reasoning here.
Tabooing “Race Dynamics”
I’ve heard a lot of people say that this “is bad for race dynamics”. I think that this conflates a couple of different mechanisms by which releasing Claude-3 might have been bad.
So, taboo-ing “race dynamics”, a common narrative behind these words is
As companies release better & better models, this incentivizes other companies to pursue more capable models at the expense of safety. Eventually, one company goes too far, produces unaligned AGI, and we all die”.
---
Outline:
(00:28) Tabooing “Race Dynamics”
(01:28) Did releasing Claude-3 cause other AI labs to invest less in evals/redteaming models before deployment?
(02:43) Did releasing Claude-3 divert resources away from alignment research and into capabilities research?
(03:48) Releasing Very SOTA Models
(04:18) Anthropic at the Frontier is Good?
(06:18) What I Currently Think
(06:59) Conclusion
(08:34) Appendix:
(08:38) Capabilities Leakage
(12:37) Is Increasing Capabilities Bad?
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
461 Listeners