CVJun 14, 2015

Reading Scene Text in Deep Convolutional Sequences

Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang

arXiv:1506.04395v226.9319 citations

Originality Highly original

AI Analysis

This addresses the problem of reading distorted or ambiguous text in images for computer vision applications, representing a novel method rather than an incremental improvement.

The paper tackles scene text recognition by treating it as a sequence labeling problem, using a Deep-Text Recurrent Network (DTRN) that combines CNN and LSTM to avoid character segmentation and leverage context, achieving reliable recognition without pre- or post-processing.

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. Codes for the DTRN will be available.

View on arXiv PDF

Similar