CVJun 14, 2015

Reading Scene Text in Deep Convolutional Sequences

arXiv:1506.04395v2319 citations
Originality Highly original
AI Analysis

This addresses the problem of reading distorted or ambiguous text in images for computer vision applications, representing a novel method rather than an incremental improvement.

The paper tackles scene text recognition by treating it as a sequence labeling problem, using a Deep-Text Recurrent Network (DTRN) that combines CNN and LSTM to avoid character segmentation and leverage context, achieving reliable recognition without pre- or post-processing.

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes