May 22, 2025

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

16 minutes

This paper introduces Multi-Objective Preference Optimization (MOPO), a novel algorithm designed to align large language models with complex human preferences that involve multiple, potentially conflicting goals like helpfulness and harmlessness. Unlike prior methods that often reduce multi-objective alignment to a single score, MOPO frames the problem as a constrained optimization, maximizing a primary objective while ensuring secondary objectives meet certain thresholds. The paper demonstrates through synthetic and real-world experiments that MOPO effectively approximates the Pareto front—the set of optimal trade-offs between objectives—and outperforms existing techniques in achieving a better balance across various preference dimensions, while also showing robustness to different settings.

...more

View all episodes

By Enoch H. Kang

May 22, 2025

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

16 minutes

...more

Share Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Sign up to save your podcasts

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models