
Sign up to save your podcasts
Or


The paper "FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents" introduces a novel dual-agent framework designed to overcome the context window limitations of large language models (LLMs) during complex, long-horizon research tasks.
The core innovation of FS-Researcher is the use of a persistent, file-system-based workspace that serves as an external memory. This allows the agents to store and organize information far exceeding a standard model's context limit. The framework operates in two distinct stages:
The Role of the File System
The workspace utilizes control files (such as todos, checklists, and logs) to track progress and coordinate between agent sessions. This structure enables iterative refinement, where agents can revisit and fix errors across multiple sessions, mirroring a human-like research workflow.
Performance and Scaling
Experimental results on benchmarks like DeepResearch Bench and DeepConsult show that FS-Researcher achieves state-of-the-art (SOTA) quality compared to both proprietary and open-source systems.
A major finding of the paper is the validation of test-time scaling: there is a positive correlation between the quality of the final report and the computation (rounds of iterations) allocated to the Context Builder. As more rounds are invested in building the knowledge base, the resulting reports become more evidence-grounded and comprehensive.
By Yun WuThe paper "FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents" introduces a novel dual-agent framework designed to overcome the context window limitations of large language models (LLMs) during complex, long-horizon research tasks.
The core innovation of FS-Researcher is the use of a persistent, file-system-based workspace that serves as an external memory. This allows the agents to store and organize information far exceeding a standard model's context limit. The framework operates in two distinct stages:
The Role of the File System
The workspace utilizes control files (such as todos, checklists, and logs) to track progress and coordinate between agent sessions. This structure enables iterative refinement, where agents can revisit and fix errors across multiple sessions, mirroring a human-like research workflow.
Performance and Scaling
Experimental results on benchmarks like DeepResearch Bench and DeepConsult show that FS-Researcher achieves state-of-the-art (SOTA) quality compared to both proprietary and open-source systems.
A major finding of the paper is the validation of test-time scaling: there is a positive correlation between the quality of the final report and the computation (rounds of iterations) allocated to the Context Builder. As more rounds are invested in building the knowledge base, the resulting reports become more evidence-grounded and comprehensive.