June 06, 2025

VLMs for Image Scoring and Self-Explanation

19 minutes

This research presents a novel training method for Vision Language Models (VLMs) focused on improving their ability to both assign scores to images and provide natural language explanations for those scores. By leveraging an existing image scoring dataset and an instruction-tuned VLM, the approach utilizes self-training without requiring additional external data or models. A key innovation is the creation of a dataset using Direct Preference Optimization (DPO) to enhance the alignment between predicted scores and generated text justifications. Through an iterative process of training the VLM on two self-generated datasets and then merging the resulting models, the system demonstrably improves both the accuracy of image scoring and the consistency of its accompanying explanations.

...more

View all episodes

By Enoch H. Kang

June 06, 2025

VLMs for Image Scoring and Self-Explanation

19 minutes

...more

Share VLMs for Image Scoring and Self-Explanation

Sign up to save your podcasts

VLMs for Image Scoring and Self-Explanation

VLMs for Image Scoring and Self-Explanation