Neural intel Pod

Confidence-Reward Preference Optimization for Machine Translation


Listen Later

This pod introduces Confidence-Reward driven Preference Optimization (CRPO), a novel method for improving machine translation by more effectively selecting training data for large language models (LLMs). The paper highlights challenges in applying LLMs to translation due to pretraining on English-centric data and the complexity of traditional reinforcement learning from human feedback. While Direct Preference Optimization (DPO) simplifies training, its success relies on high-quality preference data. CRPO addresses this by combining reward scores with model confidence to identify challenging sentence pairs where the model is uncertain or underperforming, leading to more efficient fine-tuning. The authors demonstrate CRPO's effectiveness on both LLMs and encoder-decoder models, showing it outperforms existing data selection methods in translation accuracy and data efficiency.

...more
View all episodesView all episodes
Download on the App Store

Neural intel PodBy Neural Intelligence Network