
Sign up to save your podcasts
Or


The paper "Large Language Model Reasoning Failures" is a comprehensive survey that systematically categorizes and analyzes the various ways Large Language Models (LLMs) fail at reasoning tasks. To unify fragmented research in the field, the authors introduce a two-axis taxonomy that organizes failures based on the type of reasoning and the nature of the failure.
The taxonomy divides reasoning into embodied (physical world interaction) and non-embodied types, with the latter further split into informal (intuitive judgments) and formal (logical and mathematical) reasoning. On the second axis, failures are classified into three categories:
The paper provides detailed definitions for these failures, explores their root causes—such as the limitations of next-token prediction—and discusses mitigation strategies like Chain-of-Thought prompting and data-centric approaches. By providing a structured perspective and a public GitHub repository of related research, the survey aims to guide future work toward developing more reliable and robust reasoning capabilities in AI.
By Yun WuThe paper "Large Language Model Reasoning Failures" is a comprehensive survey that systematically categorizes and analyzes the various ways Large Language Models (LLMs) fail at reasoning tasks. To unify fragmented research in the field, the authors introduce a two-axis taxonomy that organizes failures based on the type of reasoning and the nature of the failure.
The taxonomy divides reasoning into embodied (physical world interaction) and non-embodied types, with the latter further split into informal (intuitive judgments) and formal (logical and mathematical) reasoning. On the second axis, failures are classified into three categories:
The paper provides detailed definitions for these failures, explores their root causes—such as the limitations of next-token prediction—and discusses mitigation strategies like Chain-of-Thought prompting and data-centric approaches. By providing a structured perspective and a public GitHub repository of related research, the survey aims to guide future work toward developing more reliable and robust reasoning capabilities in AI.