The MAD Podcast with Matt Turck

Chasing Real AGI: Inside ARC Prize 2025 with Chollet & Knoop


Listen Later

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.


We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.


Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea


ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize


François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet


Mike Knoop

X/Twitter - https://x.com/mikeknoop


FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap


Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck


(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

...more
View all episodesView all episodes
Download on the App Store

The MAD Podcast with Matt TurckBy Matt Turck

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

17 ratings


More shows like The MAD Podcast with Matt Turck

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,271 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,021 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

514 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

212 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

8,901 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

375 Listeners

The Logan Bartlett Show by by Redpoint Ventures

The Logan Bartlett Show

189 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

122 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

77 Listeners

More or Less by Dave Morin, Jessica Lessin, Brit Morin, and Sam Lessin

More or Less

85 Listeners

Crucible Moments by Sequoia Capital

Crucible Moments

91 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

453 Listeners

AI + a16z by a16z

AI + a16z

30 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

21 Listeners

Training Data by Sequoia Capital

Training Data

40 Listeners