April 03, 2025

Sharpe Ratio-Guided Active Learning for Preference Optimization

19 minutes

This research paper introduces a novel active learning method called SHARP (SHarpe Ratio-based Active Requested Preferences) and its weighted variant W-SHARP for efficiently collecting human feedback to train large language models using Direct Preference Optimization (DPO). This method uses the Sharpe ratio to assess the potential impact and risk associated with labeling different prompt-response pairs, aiming to select the most informative data points for annotation. The paper derives a computationally efficient, closed-form expression for this selection criterion and demonstrates through experiments on various models and datasets that SHARP can outperform standard DPO with limited labeled data. The work contributes a risk-aware data selection strategy for preference learning in reinforcement learning from human feedback.

...more

View all episodes

By Enoch H. Kang

April 03, 2025

Sharpe Ratio-Guided Active Learning for Preference Optimization

19 minutes

...more

Share Sharpe Ratio-Guided Active Learning for Preference Optimization

Sign up to save your podcasts

Sharpe Ratio-Guided Active Learning for Preference Optimization

Sharpe Ratio-Guided Active Learning for Preference Optimization