The Daily ML

Ep32. SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization


Listen Later

This research paper introduces SocialGPT, a new modular framework that leverages the perception capabilities of vision foundation models (VFMs) and the reasoning capabilities of large language models (LLMs) to identify social relationships between people in images. Unlike previous methods that train a dedicated network end-to-end, SocialGPT translates image content into a textual social story using VFMs, which is then used for text-based reasoning with LLMs. The paper also proposes a novel prompt optimization method called Greedy Segment Prompt Optimization (GSPO), which helps improve the performance of LLMs by performing a greedy search on the segment level with gradient guidance. SocialGPT achieves highly competitive results on two datasets without additional model training and provides interpretable answers, offering language-based explanations for the decisions.
...more
View all episodesView all episodes
Download on the App Store

The Daily MLBy The Daily ML