
Sign up to save your podcasts
Or
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,358 Listeners
2,397 Listeners
7,818 Listeners
4,111 Listeners
87 Listeners
1,455 Listeners
8,768 Listeners
90 Listeners
354 Listeners
5,356 Listeners
15,019 Listeners
463 Listeners
128 Listeners
65 Listeners
432 Listeners