May 23, 2025

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

1 hour 4 minutes

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it.

In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short.

They discuss:

🧠 Why we still lack a shared definition of intelligence

🧪 How ARC tasks force models to learn novel skills at test time

📉 Why GPT-4-class models still underperform on ARC

🔎 The limits of traditional benchmarks like MMLU and Big-Bench

⚙️ What the OpenAI O₃ results reveal—and what they don’t

💡 Why generalization and efficiency, not raw capability, are key to AGI

Greg also shares what he’s seeing in the wild: how startups and independent researchers are using ARC as a North Star, how benchmarks shape the frontier, and why the ARC team believes we’ll know we’ve reached AGI when humans can no longer write tasks that models can’t solve.

This conversation is about evaluation—not hype. If you care about where AI is really headed, this one’s worth your time.

LINKS

ARC Prize -- What is ARC-AGI?

On the Measure of Intelligence by François Chollet

Greg Kamradt on Twitter

Hugo's High Signal Podcast with Fei-Fei Li

Vanishing Gradients YouTube Channel

Upcoming Events on Luma

Hugo's recent newsletter about upcoming events and more!

Watch the podcast here on YouTube!

🎓 Want to go deeper?

Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers.

Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.

This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.

Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.

Cohort starts July 8 — Use this link for a 10% discount

...more

View all episodes

By Hugo Bowne-Anderson

1111 ratings

May 23, 2025

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

1 hour 4 minutes

If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it.

They discuss:

🧠 Why we still lack a shared definition of intelligence

🧪 How ARC tasks force models to learn novel skills at test time

📉 Why GPT-4-class models still underperform on ARC

🔎 The limits of traditional benchmarks like MMLU and Big-Bench

⚙️ What the OpenAI O₃ results reveal—and what they don’t

💡 Why generalization and efficiency, not raw capability, are key to AGI

This conversation is about evaluation—not hype. If you care about where AI is really headed, this one’s worth your time.

LINKS

ARC Prize -- What is ARC-AGI?

On the Measure of Intelligence by François Chollet

Greg Kamradt on Twitter

Hugo's High Signal Podcast with Fei-Fei Li

Vanishing Gradients YouTube Channel

Upcoming Events on Luma

Hugo's recent newsletter about upcoming events and more!

Watch the podcast here on YouTube!

🎓 Want to go deeper?

Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers.

Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in.

This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful.

Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more.

Cohort starts July 8 — Use this link for a 10% discount

...more

More shows like Vanishing Gradients

View all

Data Skeptic

476 Listeners

a16z Podcast

1,083 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

434 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast

340 Listeners

DataFramed

268 Listeners

Practical AI

210 Listeners

Google DeepMind: The Podcast

194 Listeners

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast

489 Listeners

No Priors: Artificial Intelligence | Technology | Startups

133 Listeners

Latent Space: The AI Engineer Podcast

97 Listeners

AI + a16z

33 Listeners

High Signal: Data Science | Career | AI

18 Listeners

OpenAI Podcast

52 Listeners

Share Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

Sign up to save your podcasts

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT

More shows like Vanishing Gradients

Data Skeptic

a16z Podcast

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Super Data Science: ML & AI Podcast with Jon Krohn

NVIDIA AI Podcast

DataFramed

Practical AI

Google DeepMind: The Podcast

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

AI + a16z

High Signal: Data Science | Career | AI

OpenAI Podcast