CVNov 19, 2025

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

arXiv:2511.14998v11 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for fact-level evaluation in high-stakes financial domains, shifting from lexical overlap to factual correctness, though it is incremental as it builds on existing OCR and vision language model frameworks.

The paper tackles the problem of evaluating OCR and vision language models on financial documents by introducing FinCriticalED, a visual benchmark with 500 image-HTML pairs and expert-annotated facts, showing that even the strongest proprietary models still have substantial errors in visually complex contexts.

We introduce FinCriticalED (Financial Critical Error Detection), a visual benchmark for evaluating OCR and vision language models on financial documents at the fact level. Financial documents contain visually dense and table heavy layouts where numerical and temporal information is tightly coupled with structure. In high stakes settings, small OCR mistakes such as sign inversion or shifted dates can lead to materially different interpretations, while traditional OCR metrics like ROUGE and edit distance capture only surface level text similarity. \ficriticaled provides 500 image-HTML pairs with expert annotated financial facts covering over seven hundred numerical and temporal facts. It introduces three key contributions. First, it establishes the first fact level evaluation benchmark for financial document understanding, shifting evaluation from lexical overlap to domain critical factual correctness. Second, all annotations are created and verified by financial experts with strict quality control over signs, magnitudes, and temporal expressions. Third, we develop an LLM-as-Judge evaluation pipeline that performs structured fact extraction and contextual verification for visually complex financial documents. We benchmark OCR systems, open source vision language models, and proprietary models on FinCriticalED. Results show that although the strongest proprietary models achieve the highest factual accuracy, substantial errors remain in visually intricate numerical and temporal contexts. Through quantitative evaluation and expert case studies, FinCriticalED provides a rigorous foundation for advancing visual factual precision in financial and other precision critical domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes