CVMay 24, 2016

DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images

arXiv:1605.07314v1117 citations
Originality Highly original
AI Analysis

This work addresses the problem of accurately detecting text in images for applications like document analysis and scene understanding, representing a strong incremental improvement over prior methods.

The paper tackles text detection in natural images by introducing DeepText, a unified framework that uses a fully convolutional neural network for text proposal generation and detection, achieving F-measures of 0.83 and 0.85 on ICDAR benchmarks.

In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN). First, we propose the inception region proposal network (Inception-RPN) and design a set of text characteristic prior bounding boxes to achieve high word recall with only hundred level candidate proposals. Next, we present a powerful textdetection network that embeds ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP) for text and non-text classification and accurate localization. Finally, we apply an iterative bounding box voting scheme to pursue high recall in a complementary manner and introduce a filtering algorithm to retain the most suitable bounding box, while removing redundant inner and outer boxes for each text instance. Our approach achieves an F-measure of 0.83 and 0.85 on the ICDAR 2011 and 2013 robust text detection benchmarks, outperforming previous state-of-the-art results.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes