CE CLDec 16, 2025

SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

arXiv:2601.03260v13.32 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a core shortcoming in retrieval agents for researchers, enabling more comprehensive literature reviews, though it is incremental as it focuses on benchmarking rather than a new retrieval method.

The authors tackled the problem of scientific literature retrieval agents lacking understanding of relational dynamics, such as corroborating or conflicting studies, by proposing SciNetBench, a benchmark built from over 18 million AI papers. They found that current agents' accuracy on relation-aware tasks often falls below 20%, and providing relational ground truth improves review quality by 23.4%.

The rapid development of AI agent has spurred the development of advanced research tools, such as Deep Research. Achieving this require a nuanced understanding of the relations within scientific literature, surpasses the scope of keyword-based or embedding-based retrieval. Existing retrieval agents mainly focus on the content-level similarities and are unable to decode critical relational dynamics, such as identifying corroborating or conflicting studies or tracing technological lineages, all of which are essential for a comprehensive literature review. Consequently, this fundamental limitation often results in a fragmented knowledge structure, misleading sentiment interpretation, and inadequate modeling of collective scientific progress. To investigate relation-aware retrieval more deeply, we propose SciNetBench, the first Scientific Network Relation-aware Benchmark for literature retrieval agents. Constructed from a corpus of over 18 million AI papers, our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories. Through extensive evaluation of three categories of retrieval agents, we find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms. Notably, further experiments on the literature review tasks demonstrate that providing agents with relational ground truth leads to a substantial 23.4% performance improvement in the review quality, validating the critical importance of relation-aware retrieval. We publicly release our benchmark at https://anonymous.4open.science/r/SciNetBench/ to support future research on advanced retrieval systems.

View on arXiv PDF

Similar