
Sign up to save your podcasts
Or


Summary
Four months after my post 'LLM Generality is a Timeline Crux', results on o1-preview update me significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI.
Previous post
In June of 2024, I made a post, 'LLM Generality is a Timeline Crux', in which I argue
Reasons to update
In the original post, I gave the three main pieces of evidence against LLMs doing general reasoning that I found most compelling: blocksworld, planning/scheduling, and ARC-AGI (see original for details). All three of those seem significantly weakened in light of recent research.
Most dramatically, a new paper on blocksworld has recently been published by some of the same highly LLM-skeptical researchers (Valmeekam et al, led by Subbarao Kambhampati: 'LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of Openai’S O1 on Planbench'. TODO
The planning/scheduling evidence seemed weaker almost immediately after the post [...]
---
Outline:
(00:04) Summary
(00:21) Previous post
(00:32) Reasons to update
(03:30) Updated probability estimates
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongSummary
Four months after my post 'LLM Generality is a Timeline Crux', results on o1-preview update me significantly toward LLMs being capable of general reasoning, and hence of scaling straight to AGI.
Previous post
In June of 2024, I made a post, 'LLM Generality is a Timeline Crux', in which I argue
Reasons to update
In the original post, I gave the three main pieces of evidence against LLMs doing general reasoning that I found most compelling: blocksworld, planning/scheduling, and ARC-AGI (see original for details). All three of those seem significantly weakened in light of recent research.
Most dramatically, a new paper on blocksworld has recently been published by some of the same highly LLM-skeptical researchers (Valmeekam et al, led by Subbarao Kambhampati: 'LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of Openai’S O1 on Planbench'. TODO
The planning/scheduling evidence seemed weaker almost immediately after the post [...]
---
Outline:
(00:04) Summary
(00:21) Previous post
(00:32) Reasons to update
(03:30) Updated probability estimates
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,847 Listeners

130 Listeners

7,206 Listeners

531 Listeners

16,150 Listeners

4 Listeners

14 Listeners

2 Listeners