
Sign up to save your podcasts
Or
ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
---
Outline:
(00:10) Getting 50% (SoTA) on ARC-AGI with GPT-4o
(02:41) What is ARC-AGI?
(03:50) My method
(08:15) Detailed results
(08:19) What are the returns to more sampling?
(09:19) What are the returns to better prompting and code fixing?
(13:40) Qualitative analysis
(16:46) Caveats
(18:20) Predictions
(20:06) What it means about current LLMs
(23:27) What ARC-AGI tells us about AGI
(27:21) Appendix: A bunch of tricks used in my solutions
(34:41) Appendix: results for the train set
(34:56) Appendix: Returns to revision samples
The original text contained 15 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
---
Outline:
(00:10) Getting 50% (SoTA) on ARC-AGI with GPT-4o
(02:41) What is ARC-AGI?
(03:50) My method
(08:15) Detailed results
(08:19) What are the returns to more sampling?
(09:19) What are the returns to better prompting and code fixing?
(13:40) Qualitative analysis
(16:46) Caveats
(18:20) Predictions
(20:06) What it means about current LLMs
(23:27) What ARC-AGI tells us about AGI
(27:21) Appendix: A bunch of tricks used in my solutions
(34:41) Appendix: results for the train set
(34:56) Appendix: Returns to revision samples
The original text contained 15 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,434 Listeners
2,388 Listeners
7,906 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners