
Sign up to save your podcasts
Or
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%.
There are a few reasons to trust Epoch's score over OpenAIs:
Which had Python access.
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,334 Listeners
2,389 Listeners
8,004 Listeners
4,120 Listeners
90 Listeners
1,494 Listeners
9,254 Listeners
91 Listeners
424 Listeners
5,448 Listeners
15,457 Listeners
506 Listeners
127 Listeners
71 Listeners
466 Listeners