CLOct 23, 2025

DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking

arXiv:2510.20168v112 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This addresses a critical deficiency in real-world applications like market analysis and business development, but it is incremental as it focuses on benchmarking rather than proposing a new method.

The paper tackles the problem of search agents lacking the ability to simultaneously perform deep reasoning and wide-scale information collection, introducing DeepWideSearch as the first benchmark for this purpose, where state-of-the-art agents achieve only a 2.39% average success rate.

Current search agents fundamentally lack the ability to simultaneously perform \textit{deep} reasoning over multi-hop retrieval and \textit{wide}-scale information collection-a critical deficiency for real-world applications like comprehensive market analysis and business development. To bridge this gap, we introduce DeepWideSearch, the first benchmark explicitly designed to evaluate agents to integrate depth and width in information seeking. In DeepWideSearch, agents must process a large volume of data, each requiring deep reasoning over multi-hop retrieval paths. Specifically, we propose two methods to converse established datasets, resulting in a curated collection of 220 questions spanning 15 diverse domains. Extensive experiments demonstrate that even state-of-the-art agents achieve only 2.39% average success rate on DeepWideSearch, highlighting the substantial challenge of integrating depth and width search in information-seeking tasks. Furthermore, our error analysis reveals four failure modes: lack of reflection, overreliance on internal knowledge, insufficient retrieval, and context overflow-exposing key limitations in current agent architectures. We publicly release DeepWideSearch to catalyze future research on more capable and robust information-seeking agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes