CVApr 30, 2025

Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis

arXiv:2505.00746v22 citationsh-index: 5
Originality Incremental advance
AI Analysis

This provides a lightweight tool for post-editing GPT-based OCR, particularly for mathematical documents, but it is incremental as it builds on existing entropy analysis methods.

The paper tackled the problem of localizing OCR errors in GPT-based transcription of mathematical documents by developing an entropy-heat-mapping method that uses sliding-window Shannon analysis to identify high-entropy regions likely containing errors, and it showed that the vast majority of true errors are concentrated in these regions.

Vision-language models such as OpenAI GPT-4o can transcribe mathematical documents directly from images, yet their token-level confidence signals are seldom used to pinpoint local recognition mistakes. We present an entropy-heat-mapping proof-of-concept that turns per-token Shannon entropy into a visual ''uncertainty landscape''. By scanning the entropy sequence with a fixed-length sliding window, we obtain hotspots that are likely to contain OCR errors such as missing symbols, mismatched braces, or garbled prose. Using a small, curated set of scanned research pages rendered at several resolutions, we compare the highlighted hotspots with the actual transcription errors produced by GPT-4o. Our analysis shows that the vast majority of true errors are indeed concentrated inside the high-entropy regions. This study demonstrates--in a minimally engineered setting--that sliding-window entropy can serve as a practical, lightweight aid for post-editing GPT-based OCR. All code and annotation guidelines are released to encourage replication and further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes