
Sign up to save your podcasts
Or
Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。
今天的主题是:Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning CapabilitySummary
This research paper introduces cDPO, a novel approach to improve the reasoning capabilities of Large Language Models (LLMs). cDPO identifies "critical tokens"—tokens crucial to correct or incorrect reasoning—using contrastive estimation by comparing models trained on correct and incorrect reasoning trajectories. This allows for token-level reward adjustments during preference optimization, enhancing accuracy. Experiments on GSM8K and MATH500 benchmarks using Llama-3 and DeepSeek-math models demonstrate cDPO's superior performance over existing methods. The paper also explores the impact of various hyperparameters and offers an in-depth comparison with related techniques in contrastive estimation and reinforcement learning. The findings suggest that focusing on critical tokens significantly improves LLM reasoning accuracy.
原文链接:https://arxiv.org/abs/2411.19943
Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。
今天的主题是:Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning CapabilitySummary
This research paper introduces cDPO, a novel approach to improve the reasoning capabilities of Large Language Models (LLMs). cDPO identifies "critical tokens"—tokens crucial to correct or incorrect reasoning—using contrastive estimation by comparing models trained on correct and incorrect reasoning trajectories. This allows for token-level reward adjustments during preference optimization, enhancing accuracy. Experiments on GSM8K and MATH500 benchmarks using Llama-3 and DeepSeek-math models demonstrate cDPO's superior performance over existing methods. The paper also explores the impact of various hyperparameters and offers an in-depth comparison with related techniques in contrastive estimation and reinforcement learning. The findings suggest that focusing on critical tokens significantly improves LLM reasoning accuracy.
原文链接:https://arxiv.org/abs/2411.19943