CVMar 24, 2020

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

arXiv:2003.10608v673 citationsHas Code
AI Analysis

This addresses the need for cheaper, high-quality synthetic data in computer vision, particularly for scene text analysis, though it is incremental as it builds on existing synthetic data approaches.

The paper tackles the problem of expensive manual annotation for scene text detection by introducing UnrealText, a method that synthesizes realistic scene text images using a 3D graphics engine, which improves both detection and recognition performance as verified in experiments.

Synthetic data has been a critical tool for training scene text detection and recognition models. On the one hand, synthetic word images have proven to be a successful substitute for real images in training scene text recognizers. On the other hand, however, scene text detectors still heavily rely on a large amount of manually annotated real-world images, which are expensive. In this paper, we introduce UnrealText, an efficient image synthesis method that renders realistic images via a 3D graphics engine. 3D synthetic engine provides realistic appearance by rendering scene and text as a whole, and allows for better text region proposals with access to precise scene information, e.g. normal and even object meshes. The comprehensive experiments verify its effectiveness on both scene text detection and recognition. We also generate a multilingual version for future research into multilingual scene text detection and recognition. Additionally, we re-annotate scene text recognition datasets in a case-sensitive way and include punctuation marks for more comprehensive evaluations. The code and the generated datasets are released at: https://github.com/Jyouhou/UnrealText/ .

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes