February 12, 2025

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

1 hour 9 minutes

Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Jan Disselhoff

https://www.linkedin.com/in/jan-disselhoff-1423a2240/

Daniel Franzen

https://github.com/da-fr

ARC Prize: http://arcprize.org/

TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0

TOC

1. Solution Architecture and Strategy Overview

[00:00:00] 1.1 Initial Solution Overview and Model Architecture

[00:04:25] 1.2 LLM Capabilities and Dataset Approach

[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies

[00:14:08] 1.4 Sampling Methods and Search Implementation

[00:17:52] 1.5 ARC vs Language Model Context Comparison

2. LLM Search and Model Implementation

[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation

[00:27:04] 2.2 Symmetry Augmentation and Model Architecture

[00:30:11] 2.3 Model Intelligence Characteristics and Performance

[00:37:23] 2.4 Tokenization and Numerical Processing Challenges

3. Advanced Training and Optimization

[00:45:15] 3.1 DFS Token Selection and Probability Thresholds

[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs

[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention

[00:56:10] 3.4 Training Infrastructure and Optimization Experiments

[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

REFS

[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

https://arxiv.org/html/2411.14215

[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

https://github.com/michaelhodel/re-arc

[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

https://arxiv.org/html/2408.00724v2

[00:16:55] Language model reachability space exploration, University of Toronto

https://www.youtube.com/watch?v=Bpgloy1dDn0

[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

[00:41:20] GPT tokenization approach for numbers, OpenAI

https://platform.openai.com/docs/guides/text-generation/tokenizer-examples

[00:46:25] DFS in AI search strategies, Russell & Norvig

https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997

[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

https://www.pnas.org/doi/10.1073/pnas.1611835114

[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

https://arxiv.org/abs/2106.09685

[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

[01:04:55] Original MCTS in computer Go, Yifan Jin

https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

...more

View all episodes

By Machine Learning Street Talk (MLST)

4.7

9090 ratings

February 12, 2025

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

1 hour 9 minutes

SPONSOR MESSAGES:

***

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Jan Disselhoff

https://www.linkedin.com/in/jan-disselhoff-1423a2240/

Daniel Franzen

https://github.com/da-fr

ARC Prize: http://arcprize.org/

TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0

TOC

1. Solution Architecture and Strategy Overview

[00:00:00] 1.1 Initial Solution Overview and Model Architecture

[00:04:25] 1.2 LLM Capabilities and Dataset Approach

[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies

[00:14:08] 1.4 Sampling Methods and Search Implementation

[00:17:52] 1.5 ARC vs Language Model Context Comparison

2. LLM Search and Model Implementation

[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation

[00:27:04] 2.2 Symmetry Augmentation and Model Architecture

[00:30:11] 2.3 Model Intelligence Characteristics and Performance

[00:37:23] 2.4 Tokenization and Numerical Processing Challenges

3. Advanced Training and Optimization

[00:45:15] 3.1 DFS Token Selection and Probability Thresholds

[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs

[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention

[00:56:10] 3.4 Training Infrastructure and Optimization Experiments

[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

REFS

[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

https://arxiv.org/html/2411.14215

[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

https://github.com/michaelhodel/re-arc

[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

https://arxiv.org/html/2408.00724v2

[00:16:55] Language model reachability space exploration, University of Toronto

https://www.youtube.com/watch?v=Bpgloy1dDn0

[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

[00:41:20] GPT tokenization approach for numbers, OpenAI

https://platform.openai.com/docs/guides/text-generation/tokenizer-examples

[00:46:25] DFS in AI search strategies, Russell & Norvig

https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997

[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

https://www.pnas.org/doi/10.1073/pnas.1611835114

[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

https://arxiv.org/abs/2106.09685

[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

[01:04:55] Original MCTS in computer Go, Yifan Jin

https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

...more

More shows like Machine Learning Street Talk (MLST)

View all

Data Skeptic

479 Listeners

The a16z Show

1,095 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast

333 Listeners

Y Combinator Startup Podcast

228 Listeners

Practical AI

204 Listeners

ManifoldOne

95 Listeners

Google DeepMind: The Podcast

207 Listeners

Dwarkesh Podcast

517 Listeners

Big Technology Podcast

501 Listeners

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

This Day in AI Podcast

228 Listeners

AI + a16z

36 Listeners

Training Data

40 Listeners

Complex Systems with Patrick McKenzie (patio11)

134 Listeners

Share Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Sign up to save your podcasts

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

More shows like Machine Learning Street Talk (MLST)

Data Skeptic

The a16z Show

Super Data Science: ML & AI Podcast with Jon Krohn

NVIDIA AI Podcast

Y Combinator Startup Podcast

Practical AI

ManifoldOne

Google DeepMind: The Podcast

Dwarkesh Podcast

Big Technology Podcast

No Priors: Artificial Intelligence | Technology | Startups

This Day in AI Podcast

AI + a16z

Training Data

Complex Systems with Patrick McKenzie (patio11)