CLMay 25

AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora

Xiaoqing Wu, Feifei Li, Haoliang Ming, Wenhui Que

arXiv:2605.2538224.71 citations

Predicted impact top 33% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers evaluating evidence construction systems, AuthTrace provides the first unified benchmark enabling cross-paradigm diagnosis, revealing critical insights about recall dominance and paradigm-specific failure modes.

AuthTrace is a diagnostic benchmark for evidence construction systems, built on thematically dense single-author corpora. It reveals that evidence recall is the dominant predictor of answer quality (r=0.96) and that full-context prompting fails uniformly, establishing evidence construction as a necessary capacity.

Evidence construction systems--chunk retrieval, agent memory, knowledge-graph traversal, and thematic indexing--are evaluated on separate benchmarks with incompatible corpora and metrics, making cross-paradigm diagnosis impossible. We introduce AuthTrace, the first diagnostic benchmark that places all major paradigms on a single corpus and query set by exploiting the dual nature of single-author collections. Built on thematically dense corpora where all texts share style, topic, and vocabulary, AuthTrace provides 2,099 instances with exhaustive gold evidence and a fan-in gradient as the primary diagnostic axis. Comparing eight systems across two QA models, we find that (1) evidence recall--not precision--is the dominant predictor of answer quality (r = 0.96); (2) fan-in exposes paradigm-specific collapse patterns, with flat retrieval degrading 3x faster than structured-evidence systems; and (3) full-context prompting fails uniformly, establishing evidence construction as a necessary capacity beyond raw corpus exposure.

View on arXiv PDF

Similar