AI: post transformers

Mini-o3: Scaling Reasoning for Visual Search


Listen Later

This September 2025 paper introduces Mini-o3, a Vision-Language Model (VLM) designed to overcome the limitations of existing VLMs in handling complex visual search tasks that require multi-turn reasoning and trial-and-error exploration. The researchers developed a three-component training recipe, including the creation of the Visual Probe Dataset with challenging, high-resolution images, a pipeline for synthesizing diverse multi-turn trajectories for supervised finetuning, and an over-turn masking technique in reinforcement learning. This masking prevents penalization of long, incomplete reasoning paths, encouraging deeper exploration without increasing training time. Mini-o3 demonstrates state-of-the-art performance on various visual search benchmarks, showcasing its enhanced ability for complex, adaptive visual understanding through iterative observation, thought, and action.


Source:

https://arxiv.org/pdf/2509.07969

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof