AIAug 26, 2025

Hybrid Deep Searcher: Integrating Parallel and Sequential Search Reasoning

arXiv:2508.19113v14 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses efficiency and scalability issues in large reasoning models for complex QA tasks, representing an incremental improvement over existing sequential methods.

The paper tackled the problem of high inference latency and reduced coherence in large reasoning models due to purely sequential querying by introducing HDS-QA, a synthetic dataset for training models to distinguish parallelizable from sequential queries, resulting in a model that outperforms state-of-the-art baselines with F1 gains of +15.9 and +11.5 on specific benchmarks.

Large reasoning models (LRMs) have demonstrated strong performance in complex, multi-step reasoning tasks. Existing methods enhance LRMs by sequentially integrating external knowledge retrieval; models iteratively generate queries, retrieve external information, and progressively reason over this information. However, purely sequential querying increases inference latency and context length, diminishing coherence and potentially reducing accuracy. To address these limitations, we introduce HDS-QA (Hybrid Deep Search QA), a synthetic dataset automatically generated from Natural Questions, explicitly designed to train LRMs to distinguish parallelizable from sequential queries. HDS-QA comprises hybrid-hop questions that combine parallelizable independent subqueries (executable simultaneously) and sequentially dependent subqueries (requiring step-by-step resolution), along with synthetic reasoning-querying-retrieval paths involving parallel queries. We fine-tune an LRM using HDS-QA, naming the model HybridDeepSearcher, which outperforms state-of-the-art baselines across multiple benchmarks, notably achieving +15.9 and +11.5 F1 on FanOutQA and a subset of BrowseComp, respectively, both requiring comprehensive and exhaustive search. Experimental results highlight two key advantages: HybridDeepSearcher reaches comparable accuracy with fewer search turns, significantly reducing inference latency, and it effectively scales as more turns are permitted. These results demonstrate the efficiency, scalability, and effectiveness of explicitly training LRMs to leverage hybrid parallel and sequential querying.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes