CLDec 21, 2025

On Finding Inconsistencies in Documents

Charles J. Lovering, Seth Ebner, Brandon Smock, Michael Krumdick, Saad Rabbani, Ahmed Muhammad, Varshini Reddy, Chris Tanner

arXiv:2512.18601v14.9h-index: 9

Originality Incremental advance

AI Analysis

This work addresses the need for automated auditing to reduce costs in academia, law, and finance, but it is incremental as it builds on existing language models and benchmarks.

The paper tackles the problem of detecting inconsistencies in long, technical documents by introducing the FIND benchmark and evaluating language models, finding that the best model (gpt-5) recovered 64% of inserted inconsistencies and identified 136 out of 196 legitimate inconsistencies missed by authors in arXiv papers.

Professionals in academia, law, and finance audit their documents because inconsistencies can result in monetary, reputational, and scientific costs. Language models (LMs) have the potential to dramatically speed up this auditing process. To understand their abilities, we introduce a benchmark, FIND (Finding INconsistencies in Documents), where each example is a document with an inconsistency inserted manually by a domain expert. Despite the documents being long, technical, and complex, the best-performing model (gpt-5) recovered 64% of the inserted inconsistencies. Surprisingly, gpt-5 also found undiscovered inconsistencies present in the original documents. For example, on 50 arXiv papers, we judged 136 out of 196 of the model's suggestions to be legitimate inconsistencies missed by the original authors. However, despite these findings, even the best models miss almost half of the inconsistencies in FIND, demonstrating that inconsistency detection is still a challenging task.

View on arXiv PDF

Similar