March 01, 2026

EP104: WebExplorer Beats Giants at Web Research

17 minutes

The paper "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents" addresses the limitations of current open-source web agents, which often struggle with complex, multi-step information-seeking tasks due to a lack of challenging training data.

To solve this, the authors introduce WebExplorer, a novel data generation framework that synthesizes highly challenging query-answer (QA) pairs through a two-step process:

Model-Based Exploration: Instead of manually building complex knowledge graphs, the framework uses large language models (LLMs) to autonomously browse and search the web starting from a seed entity, constructing an initial multi-hop QA pair based on the discovered information.
Iterative Query Evolution: Because initial QA pairs are often too easy for advanced models to solve, the system iteratively evolves the queries in a "long-to-short" manner. It intentionally increases the difficulty by removing explicit clues and replacing specific details (like exact dates or names) with vague, obfuscated descriptions, forcing the model to perform deeper, exploratory reasoning.

Using the resulting dataset, the authors developed WebExplorer-8B (based on Qwen3-8B) using a two-phase training approach: Supervised Fine-Tuning (SFT) for cold-start initialization, followed by Reinforcement Learning (RL) using the GRPO algorithm.

Key Results:

The model supports up to a 128K context length and can handle up to 100 tool-calling turns, enabling it to solve complex, long-horizon problems.
Despite having only 8 billion parameters, WebExplorer-8B achieves state-of-the-art performance on challenging information-seeking benchmarks like BrowseComp, WebWalkerQA, and FRAMES, significantly outperforming much larger models such as WebSailor-72B.
It also demonstrates strong generalization to non-web-search academic benchmarks, such as Humanity's Last Exam (HLE), proving the robustness of the training approach.

...more

View all episodes

By Yun Wu

March 01, 2026

EP104: WebExplorer Beats Giants at Web Research

17 minutes

To solve this, the authors introduce WebExplorer, a novel data generation framework that synthesizes highly challenging query-answer (QA) pairs through a two-step process:

Model-Based Exploration: Instead of manually building complex knowledge graphs, the framework uses large language models (LLMs) to autonomously browse and search the web starting from a seed entity, constructing an initial multi-hop QA pair based on the discovered information.
Iterative Query Evolution: Because initial QA pairs are often too easy for advanced models to solve, the system iteratively evolves the queries in a "long-to-short" manner. It intentionally increases the difficulty by removing explicit clues and replacing specific details (like exact dates or names) with vague, obfuscated descriptions, forcing the model to perform deeper, exploratory reasoning.

Key Results:

The model supports up to a 128K context length and can handle up to 100 tool-calling turns, enabling it to solve complex, long-horizon problems.
Despite having only 8 billion parameters, WebExplorer-8B achieves state-of-the-art performance on challenging information-seeking benchmarks like BrowseComp, WebWalkerQA, and FRAMES, significantly outperforming much larger models such as WebSailor-72B.
It also demonstrates strong generalization to non-web-search academic benchmarks, such as Humanity's Last Exam (HLE), proving the robustness of the training approach.

...more

Share EP104: WebExplorer Beats Giants at Web Research

Sign up to save your podcasts

EP104: WebExplorer Beats Giants at Web Research

EP104: WebExplorer Beats Giants at Web Research