June 09, 2025

Personalized Preference Learning with MiCRo

47 minutes

This academic paper introduces MiCRo, a two-stage framework designed to improve how Large Language Models (LLMs) learn and adapt to diverse human preferences, moving beyond the traditional assumption of a single universal preference. Reward modeling, a key component in aligning LLMs with human feedback, typically uses a single model that struggles with varied preferences. The authors demonstrate theoretically that relying on a single model for diverse preferences leads to unavoidable errors. MiCRo addresses this by first using a mixture model to identify different preference subgroups from standard binary data, then employing a context-aware routing strategy to personalize responses dynamically based on limited user information. Experimental results indicate that MiCRo effectively captures diverse preferences and enhances personalized performance compared to single models and other baselines.

...more

View all episodes

By Neuralintel.org

June 09, 2025

Personalized Preference Learning with MiCRo

47 minutes

...more

Share Personalized Preference Learning with MiCRo

Sign up to save your podcasts

Personalized Preference Learning with MiCRo

Personalized Preference Learning with MiCRo