DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
This work addresses the problem of accurately detecting text in images for applications like document analysis and scene understanding, representing a strong incremental improvement over prior methods.
The paper tackles text detection in natural images by introducing DeepText, a unified framework that uses a fully convolutional neural network for text proposal generation and detection, achieving F-measures of 0.83 and 0.85 on ICDAR benchmarks.
In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN). First, we propose the inception region proposal network (Inception-RPN) and design a set of text characteristic prior bounding boxes to achieve high word recall with only hundred level candidate proposals. Next, we present a powerful textdetection network that embeds ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP) for text and non-text classification and accurate localization. Finally, we apply an iterative bounding box voting scheme to pursue high recall in a complementary manner and introduce a filtering algorithm to retain the most suitable bounding box, while removing redundant inner and outer boxes for each text instance. Our approach achieves an F-measure of 0.83 and 0.85 on the ICDAR 2011 and 2013 robust text detection benchmarks, outperforming previous state-of-the-art results.