
Sign up to save your podcasts
Or


A key risk factor for scheming (and misalignment more generally) is opaque reasoning ability.
Important caveat: To get human completion times, I ask Opus 4.5 (with thinking) to estimate how long it would take the median AIME participant to complete a given problem.
Here are the 50% reliability time horizon results:
I find that Opus 4.5 has a no-CoT 50% reliability time horizon of 3.5 minutes and that time horizon has been doubling every 9 months.
In an earlier post (Recent LLMs can leverage filler tokens or repeated problems to improve (no-CoT) math performance), I found that repeating the problem substantially boosts performance.
---
Outline:
(02:33) Some details about the time horizon fit and data
(04:17) Analysis
(06:59) Appendix: scores for Gemini 3 Pro
(11:41) Appendix: full result tables
(11:46) Time Horizon - Repetitions
(12:07) Time Horizon - Filler
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongA key risk factor for scheming (and misalignment more generally) is opaque reasoning ability.
Important caveat: To get human completion times, I ask Opus 4.5 (with thinking) to estimate how long it would take the median AIME participant to complete a given problem.
Here are the 50% reliability time horizon results:
I find that Opus 4.5 has a no-CoT 50% reliability time horizon of 3.5 minutes and that time horizon has been doubling every 9 months.
In an earlier post (Recent LLMs can leverage filler tokens or repeated problems to improve (no-CoT) math performance), I found that repeating the problem substantially boosts performance.
---
Outline:
(02:33) Some details about the time horizon fit and data
(04:17) Analysis
(06:59) Appendix: scores for Gemini 3 Pro
(11:41) Appendix: full result tables
(11:46) Time Horizon - Repetitions
(12:07) Time Horizon - Filler
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,330 Listeners

2,456 Listeners

8,487 Listeners

4,175 Listeners

95 Listeners

1,611 Listeners

9,955 Listeners

96 Listeners

516 Listeners

5,506 Listeners

15,832 Listeners

555 Listeners

130 Listeners

91 Listeners

472 Listeners