October 09, 2025

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

17 minutes

This paper investigate two major drawbacks in the reward learning phase of RLHF: reward overfitting and reward overoptimization, which often occur because the standard cross-entropy loss is inadequate for imbalanced preference datasets. To address these issues, the paper introduces a novel algorithm called Iterative Data Smoothing (IDS), which mitigates these problems by iteratively updating hard comparison labels with softer, model-predicted labels during training. Theoretical analysis and empirical results in both multi-armed bandit and neural network settings demonstrate that IDS outperforms traditional Maximum Likelihood Estimation (MLE), offering a more robust approach to reward training.

...more

View all episodes

By Enoch H. Kang

October 09, 2025

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

17 minutes

...more

Share Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Sign up to save your podcasts

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF