April 20, 2026

EP158: The hidden blind spots of AI logic

18 minutes

The paper "Large Language Model Reasoning Failures" is a comprehensive survey that systematically categorizes and analyzes the various ways Large Language Models (LLMs) fail at reasoning tasks. To unify fragmented research in the field, the authors introduce a two-axis taxonomy that organizes failures based on the type of reasoning and the nature of the failure.

The taxonomy divides reasoning into embodied (physical world interaction) and non-embodied types, with the latter further split into informal (intuitive judgments) and formal (logical and mathematical) reasoning. On the second axis, failures are classified into three categories:

Fundamental failures: Intrinsic weaknesses in LLM architectures (e.g., the "reversal curse" or limited working memory) that broadly affect performance.
Application-specific limitations: Shortcomings that manifest in particular domains, such as Theory of Mind or 3D spatial planning.
Robustness issues: Inconsistencies where performance drops due to minor variations in prompt phrasing or task structure.

The paper provides detailed definitions for these failures, explores their root causes—such as the limitations of next-token prediction—and discusses mitigation strategies like Chain-of-Thought prompting and data-centric approaches. By providing a structured perspective and a public GitHub repository of related research, the survey aims to guide future work toward developing more reliable and robust reasoning capabilities in AI.

...more

View all episodes

By Yun Wu

April 20, 2026

EP158: The hidden blind spots of AI logic

18 minutes

Fundamental failures: Intrinsic weaknesses in LLM architectures (e.g., the "reversal curse" or limited working memory) that broadly affect performance.
Application-specific limitations: Shortcomings that manifest in particular domains, such as Theory of Mind or 3D spatial planning.
Robustness issues: Inconsistencies where performance drops due to minor variations in prompt phrasing or task structure.

...more

Share EP158: The hidden blind spots of AI logic

Sign up to save your podcasts

EP158: The hidden blind spots of AI logic

EP158: The hidden blind spots of AI logic