CVMar 28, 2019

Pyramid Mask Text Detector

arXiv:1903.11800v172 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately detecting text in natural scene images for applications like text recognition systems, representing an incremental improvement over existing Mask R-CNN methods.

The paper tackles scene text detection by proposing a Mask R-CNN-based framework called Pyramid Mask Text Detector (PMTD), which uses pixel-level regression and a novel plane clustering algorithm to generate soft text masks and optimal text boxes, achieving an F-measure of 80.13% on the ICDAR 2017 MLT dataset.

Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes