IRCLMay 21, 2025

InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation

arXiv:2505.15872v211 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses the problem of inadequate evaluation for agentic information seeking in AI research, though it is incremental as it builds on existing RAG and benchmarking work.

The paper tackles the lack of benchmarks for evaluating agentic retrieval-augmented generation systems in dynamic web environments by introducing InfoDeepSeek, a new benchmark with challenging queries and an evaluation framework, which reveals nuanced agent behaviors through extensive experiments.

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by grounding responses with retrieved information. As an emerging paradigm, Agentic RAG further enhances this process by introducing autonomous LLM agents into the information seeking process. However, existing benchmarks fall short in evaluating such systems, as they are confined to a static retrieval environment with a fixed, limited corpus} and simple queries that fail to elicit agentic behavior. Moreover, their evaluation protocols assess information seeking effectiveness by pre-defined gold sets of documents, making them unsuitable for the open-ended and dynamic nature of real-world web environments. To bridge this gap, we present InfoDeepSeek, a new benchmark with challenging questions designed for assessing agentic information seeking in real-world, dynamic web environments. We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity. Based on this, we develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes. Through extensive experiments across LLMs, search engines, and question types, InfoDeepSeek reveals nuanced agent behaviors and offers actionable insights for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes