
Sign up to save your podcasts
Or
This document introduces CityAVOS, a new benchmark dataset designed for Aerial Visual Object Search (AVOS)tasks using Unmanned Aerial Vehicles (UAVs) in realistic urban environments. The text describes the unique challenges of urban AVOS, such as complex semantics and difficult identification of similar objects, which differentiate it from previous navigation and object search tasks. It also presents PRPSearcher, a novel Multi-modal Large Language Model (MLLM)-powered agentic method that employs spatial perception, target reasoning, and action planningusing semantic, cognitive, and uncertainty maps to improve UAV search capabilities. Experimental results demonstrate that PRPSearcher outperforms existing baseline methods, though human performance remains the ultimate benchmark, highlighting areas for future research in autonomous urban search.
This document introduces CityAVOS, a new benchmark dataset designed for Aerial Visual Object Search (AVOS)tasks using Unmanned Aerial Vehicles (UAVs) in realistic urban environments. The text describes the unique challenges of urban AVOS, such as complex semantics and difficult identification of similar objects, which differentiate it from previous navigation and object search tasks. It also presents PRPSearcher, a novel Multi-modal Large Language Model (MLLM)-powered agentic method that employs spatial perception, target reasoning, and action planningusing semantic, cognitive, and uncertainty maps to improve UAV search capabilities. Experimental results demonstrate that PRPSearcher outperforms existing baseline methods, though human performance remains the ultimate benchmark, highlighting areas for future research in autonomous urban search.