Best AI papers explained

Learning to summarize user information for personalized RLHF


Listen Later

This paper introduces Preference Learning Using Summarization (PLUS), a novel framework designed to personalizing large language models (LLMs) by aligning them with diverse user preferences. Unlike standard methods that assume a single set of values for all users, PLUS uses reinforcement learning to generate concise, text-based summaries of a user's characteristics and past interactions. These summaries then condition a reward model, enabling it to make more accurate, personalized predictions about what a specific user values in a response. A central innovation of PLUS is the online co-adaptation loop, where the summarizer and the reward model are trained simultaneously to ensure the text summaries capture the most relevant information. Experiments demonstrate that PLUS significantly improves reward model accuracy and excels at generalizing to new topics and users compared to existing techniques. Furthermore, the framework allows for the personalization of proprietary models like GPT-4 without further fine-tuning, while offering human-readable summaries that enhance transparency and user control.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang