CVJul 29, 2021

Why You Should Try the Real Data for the Scene Text Recognition

arXiv:2107.13938v111 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses the data diversity bottleneck for researchers in scene text recognition, though it is incremental as it applies an existing method to new data.

The paper tackles the problem of limited diversity in synthetic training data for scene text recognition by using the recently released OpenImages V5 annotations, achieving results comparable to state-of-the-art models and outperforming them on some datasets.

Recent works in the text recognition area have pushed forward the recognition results to the new horizons. But for a long time a lack of large human-labeled natural text recognition datasets has been forcing researchers to use synthetic data for training text recognition models. Even though synthetic datasets are very large (MJSynth and SynthTest, two most famous synthetic datasets, have several million images each), their diversity could be insufficient, compared to natural datasets like ICDAR and others. Fortunately, the recently released text-recognition annotation for OpenImages V5 dataset has comparable with synthetic dataset number of instances and more diverse examples. We have used this annotation with a Text Recognition head architecture from the Yet Another Mask Text Spotter and got comparable to the SOTA results. On some datasets we have even outperformed previous SOTA models. In this paper we also introduce a text recognition model. The model's code is available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes