AI Explained Official Podcast

o3 - wow


Listen Later

o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. 


FrontierMath: https://epoch.ai/frontiermath

https://arxiv.org/pdf/2411.04872

Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

MLC Paper: 

https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social

AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1

Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614

Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/

Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893

Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/

Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518

David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638

OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/

https://simple-bench.com/

John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725


00:00 - Introduction

01:19 - What is o3?

03:18 - FrontierMath

05:15 - o4, o5

06:03 - GPQA

06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

08:13 - 1st Caveat

09:03 - Compositionality?

10:16 - SimpleBench?

13:11 - ARC-AGI, Chollet



...more
View all episodesView all episodes
Download on the App Store

AI Explained Official PodcastBy Philip - Host of AI Explained YT

  • 3.1
  • 3.1
  • 3.1
  • 3.1
  • 3.1

3.1

9 ratings


More shows like AI Explained Official Podcast

View all
NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

346 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

203 Listeners

Last Week in AI by Skynet Today

Last Week in AI

313 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

100 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

552 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

512 Listeners

Hard Fork by The New York Times

Hard Fork

5,599 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

143 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

104 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

227 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

682 Listeners

Everyday AI Podcast – An AI and ChatGPT Podcast by Everyday AI

Everyday AI Podcast – An AI and ChatGPT Podcast

113 Listeners

Money Stuff: The Podcast by Bloomberg

Money Stuff: The Podcast

401 Listeners

AI Explored by Michael Stelzner, Social Media Examiner—AI marketing

AI Explored

98 Listeners

How I AI by Claire Vo

How I AI

160 Listeners