IRApr 29

AgentSim: A Platform for Verifiable Agent-Trace Simulation

arXiv:2604.2665377.4Has Code
AI Analysis

For researchers training trustworthy agentic LLMs, AgentSim provides a method to generate high-quality, grounded reasoning trajectories that go beyond outcome-only or interface-action data.

AgentSim is an open-source platform for generating verifiable, stepwise traces of RAG agent reasoning over document collections, producing the Agent-Trace Corpus (ATC) with over 103,000 grounded reasoning steps across three IR benchmarks, achieving 100% grounding rate on substantive answers.

Training trustworthy agentic LLMs requires data that shows the grounded reasoning process, not just the final answer. Existing datasets fall short: question-answering data is outcome-only, chain-of-thought data is not tied to specific documents, and web-agent datasets track interface actions rather than the core retrieval and synthesis steps of a RAG workflow. We introduce AgentSim, an open-source platform for simulating RAG agents. It generates verifiable, stepwise traces of agent reasoning over any document collection. AgentSim uses a policy to ensure the agent widely explores the document set. It combines a multi-model validation pipeline with an active human-in-the-loop process. This approach focuses human effort on difficult steps where models disagree. Using AgentSim, we construct and release the Agent-Trace Corpus (ATC), a large collection of grounded reasoning trajectories spanning three established IR benchmarks. We make three contributions: (1) the AgentSim platform with two mechanisms, Corpus-Aware Seeding and Active Validation, that improve trace diversity and quality; (2) the Agent-Trace Corpus (ATC), over 103,000 verifiable reasoning steps spanning three IR benchmarks, with 100% grounding rate on substantive answers; and (3) a comparative behavioral analysis revealing systematic differences in how state-of-the-art models approach information seeking. Platform, toolkit, and corpus are publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes