June 10, 2025

Confidence-Reward Preference Optimization for Machine Translation

55 minutes

This pod introduces Confidence-Reward driven Preference Optimization (CRPO), a novel method for improving machine translation by more effectively selecting training data for large language models (LLMs). The paper highlights challenges in applying LLMs to translation due to pretraining on English-centric data and the complexity of traditional reinforcement learning from human feedback. While Direct Preference Optimization (DPO) simplifies training, its success relies on high-quality preference data. CRPO addresses this by combining reward scores with model confidence to identify challenging sentence pairs where the model is uncertain or underperforming, leading to more efficient fine-tuning. The authors demonstrate CRPO's effectiveness on both LLMs and encoder-decoder models, showing it outperforms existing data selection methods in translation accuracy and data efficiency.

...more

View all episodes

By Neuralintel.org

June 10, 2025

Confidence-Reward Preference Optimization for Machine Translation

55 minutes

...more

Share Confidence-Reward Preference Optimization for Machine Translation

Sign up to save your podcasts

Confidence-Reward Preference Optimization for Machine Translation

Confidence-Reward Preference Optimization for Machine Translation