SELECT: Detecting Label Errors in Real-world Scene Text Data
This work addresses label errors in scene text datasets, which is a domain-specific problem for computer vision and OCR applications, but it is incremental as it builds on existing error detection methods.
The paper tackles the problem of detecting label errors in real-world scene text datasets by introducing SELECT, a multi-modal approach that addresses variable-length sequence labels and character-level errors, outperforming existing methods in accuracy and practical utility.
We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT addresses the issues of variable-length sequence labels, label sequence misalignment, and character-level errors, outperforming existing methods in accuracy and practical utility. In addition, we introduce Similarity-based Sequence Label Corruption (SSLC), a process that intentionally introduces errors into the training labels to mimic real-world error scenarios during training. SSLC not only can cause a change in the sequence length but also takes into account the visual similarity between characters during corruption. Our method is the first to detect label errors in real-world scene text datasets successfully accounting for variable-length labels. Experimental results demonstrate the effectiveness of SELECT in detecting label errors and improving STR accuracy on real-world text datasets, showcasing its practical utility.