Machine Learning Street Talk (MLST)

Francois Chollet - ARC reflections - NeurIPS 2024


Listen Later

François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?


They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.


Goto https://tufalabs.ai/

***


Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say):

https://arcprize.org/blog/oai-o3-pub-breakthrough


TOC:

1. Introduction and Opening

[00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View

[00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions

[00:01:31] 1.3 Defining Reasoning


3. ARC Competition 2024 Results and Evolution

[00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2

[00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions

[00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training


4. Transduction vs. Induction in ARC

[00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization

[00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search


5. ARC-2 Development and Future Directions

[00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2

[00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets

[00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology


6. Program Synthesis Approaches

[00:30:18] 6.1 Induction vs. Transduction

[00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks

[00:34:23] 6.3 Combining Induction and Transduction

[00:37:05] 6.4 Multi-View Insight and Overfitting Regulation


7. Latent Space and Graph-Based Synthesis

[00:38:17] 7.1 Clément Bonnet’s Latent Program Search Approach

[00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search

[00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation

[00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions


8. Compute Efficiency and Lifelong Learning

[00:48:05] 8.1 Symbolic Process for Architecture Generation

[00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy

[00:52:20] 8.3 Learning New Building Blocks for Future Tasks


9. AI Reasoning and Future Development

[00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning

[00:56:30] 9.2 Reconciling Symbolic and Connectionist Views

[01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency

[01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability


10. Program Synthesis and Research Lab

[01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis

[01:09:55] 10.2 Democratizing Programming and Natural Language Instruction


11. Frontier Models and O1 Architecture

[01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass

[01:16:55] 11.2 o1’s Natural Language Program Generation and Test-Time Compute Scaling

[01:19:35] 11.3 Logarithmic Gains with Deeper Search


12. ARC Evaluation and Human Intelligence

[01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues

[01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor

[01:26:16] 12.3 Closing Remarks and Future Directions


SHOWNOTES PDF:

https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0

...more
View all episodesView all episodes
Download on the App Store

Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

83 ratings


More shows like Machine Learning Street Talk (MLST)

View all
Data Skeptic by Kyle Polich

Data Skeptic

474 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

429 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

321 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

190 Listeners

Last Week in AI by Skynet Today

Last Week in AI

275 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

325 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

104 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

193 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

64 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

65 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

420 Listeners

AI + a16z by a16z

AI + a16z

28 Listeners

Training Data by Sequoia Capital

Training Data

31 Listeners