Share Emergent hierarchical reasoning in LLMs through reinforcement learning

Copy link

December 14, 2025

Emergent hierarchical reasoning in LLMs through reinforcement learning

13 minutes

This paper discusses how a successful RL fine-tuning uncovers an emergent two-phase hierarchical reasoning dynamic in LLMs, mirroring human cognition by separating high-level strategic planning from low-level procedural execution. The authors argue that conventional RL methods, which apply optimization pressure agnostically to all tokens, are inefficient because they fail to concentrate learning efforts on the true bottleneck: mastering strategic planning tokens. The proposed method, HICRA, addresses this by selectively amplifying the learning signal for these high-impact planning tokens, with extensive experimental results demonstrating that this targeted approach significantly outperforms baselines like GRPO across various mathematical and multimodal benchmarks. The paper also introduces Strategic Grams and Semantic Entropy as diagnostic tools to accurately track this strategic exploration, revealing why common metrics like token-level entropy are often misleading.

...more

View all episodes

By Enoch H. Kang

December 14, 2025

Emergent hierarchical reasoning in LLMs through reinforcement learning

13 minutes

...more

Sign up to save your podcasts