
Sign up to save your podcasts
Or


OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongOpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,257 Listeners

132 Listeners

7,261 Listeners

564 Listeners

16,482 Listeners

4 Listeners

14 Listeners

2 Listeners