"Creative Preference Optimization" by Mete Ismayilzada, Antonio Laverghetta Jr., Simone A. Luchini, Reet Patel, Antoine Bosselut, Lonneke van der Plas, Roger Beaty
Summary
This document introduces Creative Preference Optimization (CRPO), a novel method designed to enhance the creativity of Large Language Models (LLMs). The authors argue that existing methods often focus too narrowly on single aspects of creativity, proposing CRPO as a modular approach that integrates signals from multiple creativity dimensions—novelty, diversity, surprise, and quality—into the preference optimization process. To train and evaluate their models, they also present MUCE, a new large-scale dataset of human creativity assessments. Their experiments show that models trained with CRPO outperform baseline LLMs, including strong commercial models, in generating content that is more novel, diverse, and surprising while maintaining high quality, suggesting that directly optimizing for creativity within preference frameworks is a promising direction.