They disagreed. Instead, we got this (here's the announcement, in which Sam Altman says ‘they thought it would be fun’ to go from one frontier model to their next frontier model, yeah, that's what I’m feeling, fun):
Greg Brockman (President of OpenAI): o3, our latest reasoning model, is a breakthrough, with a step function improvement on our most challenging benchmarks. We are starting safety testing and red teaming now.
---
Outline:
(03:48) GPQA Has Fallen
(04:21) Codeforces Has Fallen
(05:32) Arc Has Kinda of Fallen But For Now Only Kinda
(09:27) They Trained on the Train Set
(15:26) AIME Has Fallen
(15:58) Frontier of Frontier Math Shifting Rapidly
(19:09) FrontierMath 4: We're Going To Need a Bigger Benchmark
(23:10) What is o3 Under the Hood?
(25:17) Not So Fast!
(28:38) Deep Thought
(30:03) Our Price Cheap
(36:32) Has Software Engineering Fallen?
(37:42) Don't Quit Your Day Job
(40:48) Master of Your Domain
(43:21) Safety Third
(47:56) The Safety Testing Program
(48:58) Safety testing in the reasoning era
(51:01) How to apply
(53:07) What Could Possibly Go Wrong?
(56:36) What Could Possibly Go Right?
(57:06) Send in the Skeptic
(59:25) This is Almost Certainly Not AGI
(01:02:57) Does This Mean the Future is Open Models?
(01:07:17) Not Priced In
(01:08:39) Our Media is Failing Us
(01:14:56) Not Covered Here: Deliberative Alignment
(01:15:08) The Lighter Side
The original text contained 22 images which were described by AI.
---