AI CLMay 21

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

arXiv:2605.2290565.1

Predicted impact top 57% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For developers of self-evolving AI agents, this work addresses the problem of unreliable self-generated training data by enforcing evidence verifiability, making the training curriculum auditable and more trustworthy.

EVE-Agent introduces evidence verifiability into self-evolving search agents, ensuring each training example includes a source-grounded span whose contribution to the answer is measurable. Experiments show substantial improvements in evidence-grounded correctness over prior self-evolving search agents.

Self-evolving agents should not train on examples they cannot justify. Data-free self-evolving search agents offer a scalable route to systems that generate their own questions, answer them, and improve from their own feedback without human annotations. Yet, without verifiable evidence, this loop can reward fluent but unsupported examples, turning the self-generated curriculum into an opaque and potentially unreliable training signal. We argue that evidence verifiability is a prerequisite for trustworthy self-evolution in search agents: each generated instance should include not only an answer but also a source-grounded span whose contribution to that answer can be measured. We introduce EVE-Agent, an Evidence-Verifiable Self-Evolving Agent that operationalizes this principle through a modification to the proposer--solver framework. The proposer generates a question, an answer, and a verbatim evidence span. An evidence verifier then rewards the span according to the marginal accuracy gain when the evidence is provided. This produces a training signal that favors evidence that genuinely helps answer the question, without requiring oracle answers, human labels, or external annotations. EVE-Agent leaves the backbone model, retriever, search tool, and optimization framework unchanged. Experiments show that EVE-Agent substantially improves evidence-grounded correctness over prior self-evolving search agents. The resulting curriculum is not merely self-generated but auditable by construction: each training example carries an inspectable source span that explains why it should be trusted.

View on arXiv PDF

Similar