CVDec 16, 2025

SELECT: Detecting Label Errors in Real-world Scene Text Data

arXiv:2512.14050v1h-index: 3MMAsia
Originality Incremental advance
AI Analysis

This work addresses label errors in scene text datasets, which is a domain-specific problem for computer vision and OCR applications, but it is incremental as it builds on existing error detection methods.

The paper tackles the problem of detecting label errors in real-world scene text datasets by introducing SELECT, a multi-modal approach that addresses variable-length sequence labels and character-level errors, outperforming existing methods in accuracy and practical utility.

We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT addresses the issues of variable-length sequence labels, label sequence misalignment, and character-level errors, outperforming existing methods in accuracy and practical utility. In addition, we introduce Similarity-based Sequence Label Corruption (SSLC), a process that intentionally introduces errors into the training labels to mimic real-world error scenarios during training. SSLC not only can cause a change in the sequence length but also takes into account the visual similarity between characters during corruption. Our method is the first to detect label errors in real-world scene text datasets successfully accounting for variable-length labels. Experimental results demonstrate the effectiveness of SELECT in detecting label errors and improving STR accuracy on real-world text datasets, showcasing its practical utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes