CVFeb 24

AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, Jingheng Huan

arXiv:2602.20569v14.02 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses a critical gap in document forensics for financial and legal sectors, as existing methods are blind to AI-forged fraud, though it is incremental as it focuses on benchmarking rather than developing new detection methods.

The researchers tackled the problem of detecting AI-forged tampering in financial and form documents by creating the AIForge-Doc benchmark, which revealed that existing detectors degrade substantially with TruFor achieving AUC=0.751 (vs. 0.96 on NIST16), DocTamper achieving AUC=0.563 (vs. 0.98 in-distribution), and GPT-4o performing at chance level (0.509).

We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery datasets rely on traditional digital editing tools (e.g., Adobe Photoshop, GIMP), creating a critical gap: state-of-the-art detectors are blind to the rapidly growing threat of AI-forged document fraud. AIForge-Doc addresses this gap by systematically forging numeric fields in real-world receipt and form images using two AI inpainting APIs -- Gemini 2.5 Flash Image and Ideogram v2 Edit -- yielding 4,061 forged images from four public document datasets (CORD, WildReceipt, SROIE, XFUND) across nine languages, annotated with pixel-precise tampered-region masks in DocTamper-compatible format. We benchmark three representative detectors -- TruFor, DocTamper, and a zero-shot GPT-4o judge -- and find that all existing methods degrade substantially: TruFor achieves AUC=0.751 (zero-shot, out-of-distribution) vs. AUC=0.96 on NIST16; DocTamper achieves AUC=0.563 vs. AUC=0.98 in-distribution, with pixel-level IoU=0.020; GPT-4o achieves only 0.509 -- essentially at chance -- confirming that AI-forged values are indistinguishable to automated detectors and VLMs. These results demonstrate that AIForge-Doc represents a qualitatively new and unsolved challenge for document forensics.

View on arXiv PDF

Similar