CVLGJun 5, 2019

Efficient, Lexicon-Free OCR using Deep Learning

arXiv:1906.01969v142 citations
Originality Incremental advance
AI Analysis

This work addresses OCR challenges for applications in natural scene text recognition, but it is incremental as it builds on existing deep learning and data augmentation techniques without introducing a fundamentally new paradigm.

The paper tackles the problem of Optical Character Recognition (OCR) in unconstrained environments like natural scenes by developing a segmentation-free system using deep learning, synthetic data generation, and data augmentation, achieving improved performance through models that combine convolutional neural networks with recurrent or convolutional networks for sequence modeling.

Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In this paper, we present a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques. We render synthetic training data using large text corpora and over 2000 fonts. To simulate text occurring in complex natural scenes, we augment extracted samples with geometric distortions and with a proposed data augmentation technique - alpha-compositing with background textures. Our models employ a convolutional neural network encoder to extract features from text images. Inspired by the recent progress in neural machine translation and language modeling, we examine the capabilities of both recurrent and convolutional neural networks in modeling the interactions between input elements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes