CVJun 22, 2020

Text Recognition in Real Scenarios with a Few Labeled Samples

arXiv:2006.12209v11 citations
Originality Incremental advance
AI Analysis

This addresses a practical bottleneck for applications requiring accurate text recognition with limited labeled data, representing an incremental improvement over existing methods.

The paper tackles the problem of scene text recognition in real-world scenarios where high accuracy is needed but labeled samples are scarce, by proposing a few-shot adversarial sequence domain adaptation approach that achieves comparable performance to state-of-the-art methods.

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes