Best AI papers explained

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF


Listen Later

This paper investigate two major drawbacks in the reward learning phase of RLHF: reward overfitting and reward overoptimization, which often occur because the standard cross-entropy loss is inadequate for imbalanced preference datasets. To address these issues, the paper introduces a novel algorithm called Iterative Data Smoothing (IDS), which mitigates these problems by iteratively updating hard comparison labels with softer, model-predicted labels during training. Theoretical analysis and empirical results in both multi-armed bandit and neural network settings demonstrate that IDS outperforms traditional Maximum Likelihood Estimation (MLE), offering a more robust approach to reward training.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang