The MAD Podcast with Matt Turck

Chasing Real AGI: Inside ARC Prize 2025 with Chollet & Knoop


Listen Later

In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.


We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.


Ndea

Website - https://ndea.com

X/Twitter - https://x.com/ndea


ARC Prize

Website - https://arcprize.org

X/Twitter - https://x.com/arcprize


François Chollet

LinkedIn - https://www.linkedin.com/in/fchollet

X/Twitter - https://x.com/fchollet


Mike Knoop

X/Twitter - https://x.com/mikeknoop


FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap


Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck


(00:00) Intro

(01:05) Introduction to ARC Prize 2025 and ARC-AGI 2

(02:07) What is ARC and how it differs from other AI benchmarks

(02:54) Why current models struggle with fluid intelligence

(03:52) Shift from static LLMs to test-time adaptation

(04:19) What ARC measures vs. traditional benchmarks

(07:52) Limitations of brute-force scaling in LLMs

(13:31) Defining intelligence: adaptation and efficiency

(16:19) How O3 achieved a massive leap in ARC performance

(20:35) Speculation on O3's architecture and test-time search

(22:48) Program synthesis: what it is and why it matters

(28:28) Combining LLMs with search and synthesis techniques

(34:57) The ARC Prize structure: efficiency track, private vs. public

(42:03) Open source as a requirement for progress

(44:59) What's new in ARC-AGI 2 and human benchmark testing

(48:14) Capabilities ARC-AGI 2 is designed to test

(49:21) When will ARC-AGI 2 be saturated? AGI timelines

(52:25) Founding of NDEA and why now

(54:19) Vision beyond AGI: a factory for scientific advancement

(56:40) What NDEA is building and why it's different from LLM labs

(58:32) Hiring and remote-first culture at NDEA

(59:52) Closing thoughts and the future of AI research

...more
View all episodesView all episodes
Download on the App Store

The MAD Podcast with Matt TurckBy Matt Turck

  • 5
  • 5
  • 5
  • 5
  • 5

5

22 ratings


More shows like The MAD Podcast with Matt Turck

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,289 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

531 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,081 Listeners

Invest Like the Best with Patrick O'Shaughnessy by Colossus | Investing & Business Podcasts

Invest Like the Best with Patrick O'Shaughnessy

2,332 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

232 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

488 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

133 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

96 Listeners

AI and I by Dan Shipper

AI and I

36 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

502 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

22 Listeners

Training Data by Sequoia Capital

Training Data

41 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

64 Listeners