Surfstudio podcast

The Illusion of Thinking: How AI's Reasoning Breaks Under Pressure


Listen Later

This study investigates Large Reasoning Models (LRMs), which generate "thinking processes" like Chain-of-Thought, using **controllable puzzle environments** to avoid data contamination and analyze "thinking traces".

Key findings reveal **three performance regimes**:
*   **Low complexity**: Standard LLMs often surprisingly outperform LRMs with greater token efficiency.
*   **Medium complexity**: LRMs show an advantage due to their "thinking" mechanisms.
*   **High complexity**: **Both LRMs and standard LLMs experience complete accuracy collapse**.

Counter-intuitively, LRMs **reduce their reasoning effort** (thinking tokens) as problems approach accuracy collapse, despite having ample budget. Furthermore, LRMs exhibit **limitations in exact computation and consistently following explicit algorithms**, as providing the algorithm did not improve performance. These findings suggest current LRMs face **fundamental barriers to generalizable and robust reasoning**.

...more
View all episodesView all episodes
Download on the App Store

Surfstudio podcastBy CCStudios