
Sign up to save your podcasts
Or


This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.)
Epistemic status: subjective impressions plus one new graph plus 300 links.
Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis.
tl;dr
---
Outline:
(00:43) tl;dr
(04:02) Capabilities in 2025
(04:21) Arguments against 2025 capabilities growth being above-trend
(10:24) Arguments for 2025 capabilities growth being above-trend
(19:46) Evals crawling towards ecological validity
(22:55) Safety in 2025
(23:00) Are reasoning models safer than the old kind?
(26:05) The looming end of evals
(28:07) Prosaic misalignment
(30:34) What is the plan?
(33:10) Things which might fundamentally change the nature of LLMs
(34:59) Emergent misalignment and model personas
(36:42) Monitorability
(38:27) New people
(39:01) Overall
(39:37) Discourse in 2025
The original text contained 9 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
4x)" (32.5%). The poll shows 157 votes with 38 minutes remaining." style="max-width: 100%;" /> it's so over Owain at al: it's actually really easy to make an evil AI -> we're so back"." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongThis is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.)
Epistemic status: subjective impressions plus one new graph plus 300 links.
Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis.
tl;dr
---
Outline:
(00:43) tl;dr
(04:02) Capabilities in 2025
(04:21) Arguments against 2025 capabilities growth being above-trend
(10:24) Arguments for 2025 capabilities growth being above-trend
(19:46) Evals crawling towards ecological validity
(22:55) Safety in 2025
(23:00) Are reasoning models safer than the old kind?
(26:05) The looming end of evals
(28:07) Prosaic misalignment
(30:34) What is the plan?
(33:10) Things which might fundamentally change the nature of LLMs
(34:59) Emergent misalignment and model personas
(36:42) Monitorability
(38:27) New people
(39:01) Overall
(39:37) Discourse in 2025
The original text contained 9 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
4x)" (32.5%). The poll shows 157 votes with 38 minutes remaining." style="max-width: 100%;" /> it's so over Owain at al: it's actually really easy to make an evil AI -> we're so back"." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,370 Listeners

2,450 Listeners

8,708 Listeners

4,174 Listeners

93 Listeners

1,599 Listeners

9,855 Listeners

93 Listeners

507 Listeners

5,529 Listeners

16,019 Listeners

543 Listeners

136 Listeners

94 Listeners

475 Listeners