CVMar 31, 2016

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

arXiv:1603.09423v184 citations
Originality Highly original
AI Analysis

This improves text localization in natural images for applications like document analysis and image understanding, representing a strong specific gain over prior methods.

The paper tackles scene text detection by introducing a Cascaded Convolutional Text Network (CCTN) that uses a coarse-to-fine pipeline for direct text region estimation, achieving F-measures of 0.84 and 0.86 on ICDAR 2011 and 2013 benchmarks.

We introduce a new top-down pipeline for scene text detection. We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization. The CCTN fast detects text regions roughly from a low-resolution image, and then accurately localizes text lines from each enlarged region. We cast previous character based detection into direct text region estimation, avoiding multiple bottom- up post-processing steps. It exhibits surprising robustness and discriminative power by considering whole text region as detection object which provides strong semantic information. We customize convolutional network by develop- ing rectangle convolutions and multiple in-network fusions. This enables it to handle multi-shape and multi-scale text efficiently. Furthermore, the CCTN is computationally efficient by sharing convolutional computations, and high-level property allows it to be invariant to various languages and multiple orientations. It achieves 0.84 and 0.86 F-measures on the ICDAR 2011 and ICDAR 2013, delivering substantial improvements over state-of-the-art results [23, 1].

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes