April 30, 2025

【第212期】Self-Backtracking：自我回溯

23 minutes

Seventy3：借助NotebookLM的能力进行论文解读，专注人工智能、大模型、机器人算法方向，让大家跟着AI一起进步。

进群添加小助手微信：seventy3_podcast

备注：小宇宙

今天的主题是：Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Summary

This paper introduces Self-Backtracking, a novel technique to enhance the reasoning of large language models (LLMs) by enabling them to internally manage a search process with backtracking capabilities. The method trains LLMs to recognize suboptimal reasoning paths and autonomously backtrack to explore alternatives during both training and inference. This internalization of backtracking aims to address issues like inefficient overthinking and over-reliance on external reward models prevalent in existing slow-thinking approaches. Empirical evaluations on a mathematical reasoning task demonstrate that Self-Backtracking significantly improves performance, and a self-improvement process further refines the model's fast-thinking abilities. The research suggests a promising direction for developing more advanced and efficient LLM reasoners by integrating a fundamental search mechanism directly within the model.

本文提出了一种名为“自我回溯”（Self-Backtracking）的新方法，用于提升大型语言模型（LLMs）的推理能力。该技术通过赋予模型内部管理搜索过程的能力，使其在训练和推理过程中能够识别次优的推理路径并自主回溯，从而探索其他可能的解法。这种回溯机制的内化旨在解决现有“慢思考”方法中常见的低效反复推理以及对外部奖励模型的过度依赖等问题。在数学推理任务中的实验证明，Self-Backtracking 显著提升了模型表现，并且借助一种自我改进过程，进一步强化了模型的“快思考”能力。该研究表明，将基本的搜索机制直接集成进模型本体，为构建更先进、高效的语言模型推理系统提供了一条有前景的路径。

原文链接：https://www.arxiv.org/abs/2502.04404

...more