Share Min-Form Credit Assignment for Process Reward Model Reasoning

Copy link

May 01, 2025

Min-Form Credit Assignment for Process Reward Model Reasoning

15 minutes

The provided research paper investigates challenges in using process reward models (PRMs) for reinforcement fine-tuning (RFT) of large language models (LLMs) on reasoning tasks, specifically addressing the issue of reward hacking caused by traditional summation-based credit assignment. To mitigate this, the authors introduce PURE (Process sUpervised Reinforcement lEarning), a novel framework employing a min-form credit assignment that considers the minimum future reward, leading to more stable and efficient training. Their experiments demonstrate that PURE achieves comparable or better reasoning performance with PRMs compared to methods using verifiable rewards, and that combining PRMs with a small amount of verifiable rewards further enhances performance and reduces reward hacking. The paper also analyzes various cases of reward hacking and the causes of training collapse, offering insights for future research in PRM-based RFT.

...more

View all episodes

By Neural Intelligence Network

May 01, 2025

Min-Form Credit Assignment for Process Reward Model Reasoning

15 minutes

...more

Sign up to save your podcasts