
Sign up to save your podcasts
Or
Or rather, we don’t actually have a proper o1 system card, aside from the outside red teaming reports. At all.
Because, as I realized after writing my first draft of this, the data here does not reflect the o1 model they released, or o1 pro?
I think what happened is pretty bad on multiple levels.
---
Outline:
(02:18) Where Art Thou o1 System Card?
(05:35) Introduction (Section 1)
(06:01) Model Data and Training (Section 2)
(06:13) Challenges and Evaluations (Section 3)
(09:38) Jailbreak Evaluations (Section 3.1.2)
(11:33) Regurgitation (3.1.3) and Hallucinations (3.1.4)
(12:30) Fairness and Bias (3.1.5)
(13:33) Jailbreaks Through Custom Developer Messages (3.2)
(14:41) Chain of Thought Safety (3.3)
(18:52) External Red Teaming Via Pairwise Safety Comparisons (3.4.1)
(19:57) Jailbreak Arena (3.4.2)
(20:25) Apollo Research (3.4.3) and the ‘Escape Attempts’
(21:38) METR (3.4.4) and Autonomous Capability
(25:22) Preparedness Framework Evaluations (Section 4)
(27:47) Mitigations
(30:27) Cybersecurity
(31:22) Chemical and Biological Threats (4.5)
(31:52) Radiological and Nuclear Threat Creation (4.6)
(32:21) Persuasion (4.7)
(32:49) Model Autonomy (4.8)
(34:45) Multilingual Performance
(34:55) Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Or rather, we don’t actually have a proper o1 system card, aside from the outside red teaming reports. At all.
Because, as I realized after writing my first draft of this, the data here does not reflect the o1 model they released, or o1 pro?
I think what happened is pretty bad on multiple levels.
---
Outline:
(02:18) Where Art Thou o1 System Card?
(05:35) Introduction (Section 1)
(06:01) Model Data and Training (Section 2)
(06:13) Challenges and Evaluations (Section 3)
(09:38) Jailbreak Evaluations (Section 3.1.2)
(11:33) Regurgitation (3.1.3) and Hallucinations (3.1.4)
(12:30) Fairness and Bias (3.1.5)
(13:33) Jailbreaks Through Custom Developer Messages (3.2)
(14:41) Chain of Thought Safety (3.3)
(18:52) External Red Teaming Via Pairwise Safety Comparisons (3.4.1)
(19:57) Jailbreak Arena (3.4.2)
(20:25) Apollo Research (3.4.3) and the ‘Escape Attempts’
(21:38) METR (3.4.4) and Autonomous Capability
(25:22) Preparedness Framework Evaluations (Section 4)
(27:47) Mitigations
(30:27) Cybersecurity
(31:22) Chemical and Biological Threats (4.5)
(31:52) Radiological and Nuclear Threat Creation (4.6)
(32:21) Persuasion (4.7)
(32:49) Model Autonomy (4.8)
(34:45) Multilingual Performance
(34:55) Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.