
Sign up to save your podcasts
Or


ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
---
Outline:
(00:10) Getting 50% (SoTA) on ARC-AGI with GPT-4o
(02:41) What is ARC-AGI?
(03:50) My method
(08:15) Detailed results
(08:19) What are the returns to more sampling?
(09:19) What are the returns to better prompting and code fixing?
(13:40) Qualitative analysis
(16:46) Caveats
(18:20) Predictions
(20:06) What it means about current LLMs
(23:27) What ARC-AGI tells us about AGI
(27:21) Appendix: A bunch of tricks used in my solutions
(34:41) Appendix: results for the train set
(34:56) Appendix: Returns to revision samples
The original text contained 15 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4o
I recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
---
Outline:
(00:10) Getting 50% (SoTA) on ARC-AGI with GPT-4o
(02:41) What is ARC-AGI?
(03:50) My method
(08:15) Detailed results
(08:19) What are the returns to more sampling?
(09:19) What are the returns to better prompting and code fixing?
(13:40) Qualitative analysis
(16:46) Caveats
(18:20) Predictions
(20:06) What it means about current LLMs
(23:27) What ARC-AGI tells us about AGI
(27:21) Appendix: A bunch of tricks used in my solutions
(34:41) Appendix: results for the train set
(34:56) Appendix: Returns to revision samples
The original text contained 15 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,004 Listeners

130 Listeners

7,228 Listeners

532 Listeners

16,218 Listeners

4 Listeners

14 Listeners

2 Listeners