The paper "Agentic Reasoning for Large Language Models" provides a comprehensive survey on the paradigm shift from traditional, passive LLM inference to Agentic Reasoning. In this new framework, LLMs are treated as autonomous agents that interleave deliberation with environmental interaction, enabling them to plan, act, and continually learn.
The authors organize the landscape of agentic reasoning into three primary layers:
- Foundational Agentic Reasoning: This establishes the core capabilities a single agent needs to operate autonomously, specifically focusing on planning, tool use, and dynamic search/retrieval.
- Self-Evolving Agentic Reasoning: This layer explores how agents continuously improve and adapt over time through feedback mechanisms (such as self-critique, verification, and environmental signals) and persistent agentic memory, allowing them to learn from past interactions.
- Collective Multi-Agent Reasoning: This dimension scales intelligence to collaborative ecosystems. It examines how multiple agents take on specialized roles (e.g., manager, worker, critic) to divide labor, debate, share memory, and coordinate to solve highly complex tasks.
Across all three layers, the survey categorizes optimization strategies into two modes: in-context reasoning (which scales test-time interaction through prompting, search, and workflow orchestration without updating model weights) and post-training reasoning (which internalizes successful reasoning behaviors into the model's parameters via reinforcement learning and supervised fine-tuning).
Finally, the paper contextualizes this framework by reviewing real-world applications and benchmarks across diverse domains—including mathematics/coding, scientific discovery, embodied robotics, healthcare, and autonomous web exploration. It concludes by identifying critical open challenges for the future, such as user personalization, long-horizon credit assignment, integration with world models, and governance/safety guardrails for real-world deployment.