CVAug 4, 2019

Deep Neural Network for Semantic-based Text Recognition in Images

arXiv:1908.01403v33.43 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of text recognition in complex scenes for applications like document analysis and surveillance, offering an incremental improvement over existing methods.

The paper tackles the problem of recognizing text in images by incorporating semantic context to improve accuracy, achieving 90% accuracy on catalog images and 71% on protest images.

State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequenceto-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-wordrecognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text.

View on arXiv PDF

Similar