December 30, 2024

“o3, Oh My” by Zvi

1 hour 17 minutes

OpenAI presented o3 on the Friday before Thanksgiving, at the tail end of the 12 Days of Shipmas.

I was very much expecting the announcement to be something like a price drop. What better way to say ‘Merry Christmas,’ no?

They disagreed. Instead, we got this (here's the announcement, in which Sam Altman says ‘they thought it would be fun’ to go from one frontier model to their next frontier model, yeah, that's what I’m feeling, fun):

Greg Brockman (President of OpenAI): o3, our latest reasoning model, is a breakthrough, with a step function improvement on our most challenging benchmarks. We are starting safety testing and red teaming now.

Nat McAleese (OpenAI): o3 represents substantial progress in general-domain reasoning with reinforcement learning—excited that we were able to announce some results today! Here is a summary of what we shared about o3 in the livestream.

---

Outline:

(03:48) GPQA Has Fallen

(04:21) Codeforces Has Fallen

(05:32) Arc Has Kinda of Fallen But For Now Only Kinda

(09:27) They Trained on the Train Set

(15:26) AIME Has Fallen

(15:58) Frontier of Frontier Math Shifting Rapidly

(19:09) FrontierMath 4: We're Going To Need a Bigger Benchmark

(23:10) What is o3 Under the Hood?

(25:17) Not So Fast!

(28:38) Deep Thought

(30:03) Our Price Cheap

(36:32) Has Software Engineering Fallen?

(37:42) Don't Quit Your Day Job

(40:48) Master of Your Domain

(43:21) Safety Third

(47:56) The Safety Testing Program

(48:58) Safety testing in the reasoning era

(51:01) How to apply

(53:07) What Could Possibly Go Wrong?

(56:36) What Could Possibly Go Right?

(57:06) Send in the Skeptic

(59:25) This is Almost Certainly Not AGI

(01:02:57) Does This Mean the Future is Open Models?

(01:07:17) Not Priced In

(01:08:39) Our Media is Failing Us

(01:14:56) Not Covered Here: Deliberative Alignment

(01:15:08) The Lighter Side

The original text contained 22 images which were described by AI.

---

First published:

December 30th, 2024

Source:

https://www.lesswrong.com/posts/QHtd2ZQqnPAcknDiQ/o3-oh-my

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

December 30, 2024

“o3, Oh My” by Zvi

1 hour 17 minutes

OpenAI presented o3 on the Friday before Thanksgiving, at the tail end of the 12 Days of Shipmas.

I was very much expecting the announcement to be something like a price drop. What better way to say ‘Merry Christmas,’ no?

---

Outline:

(03:48) GPQA Has Fallen

(04:21) Codeforces Has Fallen

(05:32) Arc Has Kinda of Fallen But For Now Only Kinda

(09:27) They Trained on the Train Set

(15:26) AIME Has Fallen

(15:58) Frontier of Frontier Math Shifting Rapidly

(19:09) FrontierMath 4: We're Going To Need a Bigger Benchmark

(23:10) What is o3 Under the Hood?

(25:17) Not So Fast!

(28:38) Deep Thought

(30:03) Our Price Cheap

(36:32) Has Software Engineering Fallen?

(37:42) Don't Quit Your Day Job

(40:48) Master of Your Domain

(43:21) Safety Third

(47:56) The Safety Testing Program

(48:58) Safety testing in the reasoning era

(51:01) How to apply

(53:07) What Could Possibly Go Wrong?

(56:36) What Could Possibly Go Right?

(57:06) Send in the Skeptic

(59:25) This is Almost Certainly Not AGI

(01:02:57) Does This Mean the Future is Open Models?

(01:07:17) Not Priced In

(01:08:39) Our Media is Failing Us

(01:14:56) Not Covered Here: Deliberative Alignment

(01:15:08) The Lighter Side

The original text contained 22 images which were described by AI.

---

First published:

December 30th, 2024

Source:

https://www.lesswrong.com/posts/QHtd2ZQqnPAcknDiQ/o3-oh-my

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,193 Listeners

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat

7,227 Listeners

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show

16,216 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “o3, Oh My” by Zvi

Sign up to save your podcasts

“o3, Oh My” by Zvi

“o3, Oh My” by Zvi

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi