
Sign up to save your podcasts
Or
Summary
Four months after my post 'LLM Generality is a Timeline Crux', results on o1-preview update me significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI.
Previous post
In June of 2024, I made a post, 'LLM Generality is a Timeline Crux', in which I argue
Reasons to update
In the original post, I gave the three main pieces of evidence against LLMs doing general reasoning that I found most compelling: blocksworld, planning/scheduling, and ARC-AGI (see original for details). All three of those seem significantly weakened in light of recent research.
Most dramatically, a new paper on blocksworld has recently been published by some of the same highly LLM-skeptical researchers (Valmeekam et al, led by Subbarao Kambhampati: 'LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of Openai’S O1 on Planbench'. TODO
The planning/scheduling evidence seemed weaker almost immediately after the post [...]
---
Outline:
(00:04) Summary
(00:21) Previous post
(00:32) Reasons to update
(03:30) Updated probability estimates
---
First published:
Source:
Narrated by TYPE III AUDIO.
Summary
Four months after my post 'LLM Generality is a Timeline Crux', results on o1-preview update me significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI.
Previous post
In June of 2024, I made a post, 'LLM Generality is a Timeline Crux', in which I argue
Reasons to update
In the original post, I gave the three main pieces of evidence against LLMs doing general reasoning that I found most compelling: blocksworld, planning/scheduling, and ARC-AGI (see original for details). All three of those seem significantly weakened in light of recent research.
Most dramatically, a new paper on blocksworld has recently been published by some of the same highly LLM-skeptical researchers (Valmeekam et al, led by Subbarao Kambhampati: 'LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of Openai’S O1 on Planbench'. TODO
The planning/scheduling evidence seemed weaker almost immediately after the post [...]
---
Outline:
(00:04) Summary
(00:21) Previous post
(00:32) Reasons to update
(03:30) Updated probability estimates
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,359 Listeners
2,382 Listeners
7,947 Listeners
4,135 Listeners
87 Listeners
1,449 Listeners
9,041 Listeners
88 Listeners
377 Listeners
5,420 Listeners
15,180 Listeners
474 Listeners
121 Listeners
77 Listeners
455 Listeners