CVJul 6, 2018

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

arXiv:1807.02242v2641 citations
Originality Highly original
AI Analysis

This addresses the challenge of spotting text with arbitrary shapes, such as curved text, in natural images for applications like document analysis and computer vision, representing a strong incremental improvement over prior methods.

The paper tackles the problem of scene text spotting, which involves simultaneous text detection and recognition in natural images, by proposing an end-to-end trainable neural network called Mask TextSpotter, achieving state-of-the-art results on datasets like ICDAR2013, ICDAR2015, and Total-Text.

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes