This technical report introduces DeepVerifier, a framework designed to enhance the reliability of Deep Research Agents (DRAs) through automated verification. The researchers developed a DRA Failure Taxonomy to categorize common agent errors, such as poor source selection and faulty reasoning. By leveraging the asymmetry of verification, the system breaks down complex tasks into simpler sub-questions to check for accuracy at inference-time. This method allows agents to self-evolve through iterative feedback loops, significantly boosting performance on benchmarks like GAIA and XBench-DeepSearch. Furthermore, the authors released DeepVerifier-4K, a specialized dataset used to fine-tune open-source models for improved self-critique capabilities. Experiments demonstrate that this test-time scaling approach provides substantial accuracy gains without requiring additional heavy training. Source: January 2026 Inference-Time Scaling of Verification: Self-Evolving Deep Re-search Agents via Test-Time Rubric-Guided Verification The Chinese University of Hong Kong, Tencent AI Lab, Singapore Management University, Renmin University of China https://arxiv.org/pdf/2601.15808 Yuxuan Wan, Tianqing Fang, Zaitang Li, Yintong Huo, Wenxuan Wang, Haitao Mi, Dong Yu, Michael R. Lyu