CVDec 20, 2020

Sequence-to-Sequence Contrastive Learning for Text Recognition

arXiv:2012.10873v1135 citations
Originality Highly original
AI Analysis

This method offers improved visual representations for text recognition, particularly beneficial for tasks with limited supervision, impacting researchers and practitioners in OCR and document analysis.

This paper introduces Sequence-to-Sequence Contrastive Learning (SeqCLR) for text recognition, which contrasts at a sub-word level by dividing feature maps into instances. When fine-tuned with 100% labels, SeqCLR achieves state-of-the-art results on standard handwritten text recognition benchmarks.

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes