IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review
For researchers needing accurate, context-grounded information retrieval from scientific literature, IntrAgent provides a novel automated approach that outperforms existing methods, though the task is domain-specific.
IntrAgent introduces a new task (IntraView) for fine-grained, content-grounded information retrieval from scientific literature and proposes an LLM-based agent that mimics human reading behavior. It achieves 13.2% higher cross-domain accuracy than state-of-the-art RAG and research-agent baselines on a new benchmark (IntraBench) spanning five STEM domains.
Scientific research relies on accurate information retrieval from literature to support analytical decisions. In this work, we introduce a new task, INformation reTRieval through literAture reVIEW (IntraView), which aims to automate fine-grained information retrieval faithfully grounded in the provided content in response to research-driven queries, and propose IntrAgent, an LLM-based agent that addresses this challenging task. In particular, IntrAgent is designed to mimic human behaviors when reading literature for information retrieval -- identifying relevant sections and then iteratively extracting key details to refine the retrieved information. It follows a two-stage pipeline: a Section Ranking stage that prioritizes relevant literature sections through structural-knowledge-enabled reasoning, and an Iterative Reading stage that continuously extracts details and synthesizes them into concise, contextually grounded answers. To support rigorous evaluation, we introduce IntraBench, a new benchmark consisting of 315 test instances built from expert-authored questions paired with literature spanning five STEM domains. Across seven backbone LLMs, IntrAgent achieves on average 13.2% higher cross-domain accuracy than state-of-the-art RAG and research-agent baselines.