Marketing^AI

VLMs for Image Scoring and Self-Explanation


Listen Later

This research presents a novel training method for Vision Language Models (VLMs) focused on improving their ability to both assign scores to images and provide natural language explanations for those scores. By leveraging an existing image scoring dataset and an instruction-tuned VLM, the approach utilizes self-training without requiring additional external data or models. A key innovation is the creation of a dataset using Direct Preference Optimization (DPO) to enhance the alignment between predicted scores and generated text justifications. Through an iterative process of training the VLM on two self-generated datasets and then merging the resulting models, the system demonstrably improves both the accuracy of image scoring and the consistency of its accompanying explanations.

...more
View all episodesView all episodes
Download on the App Store

Marketing^AIBy Enoch H. Kang