Share Mini-o3: Scaling Reasoning for Visual Search

Copy link

September 10, 2025

Mini-o3: Scaling Reasoning for Visual Search

12 minutes

This September 2025 paper introduces Mini-o3, a Vision-Language Model (VLM) designed to overcome the limitations of existing VLMs in handling complex visual search tasks that require multi-turn reasoning and trial-and-error exploration. The researchers developed a three-component training recipe, including the creation of the Visual Probe Dataset with challenging, high-resolution images, a pipeline for synthesizing diverse multi-turn trajectories for supervised finetuning, and an over-turn masking technique in reinforcement learning. This masking prevents penalization of long, incomplete reasoning paths, encouraging deeper exploration without increasing training time. Mini-o3 demonstrates state-of-the-art performance on various visual search benchmarks, showcasing its enhanced ability for complex, adaptive visual understanding through iterative observation, thought, and action. Source: https://arxiv.org/pdf/2509.07969

...more

View all episodes

By mcgrof

September 10, 2025

Mini-o3: Scaling Reasoning for Visual Search

12 minutes

...more

Sign up to save your podcasts