CVMar 11, 2019

MTRNet: A Generic Scene Text Eraser

arXiv:1903.04092v358 citations
Originality Incremental advance
AI Analysis

This addresses the need for a text removal method that works across all fonts, scripts, languages, and shapes, which is incremental as it builds on existing cGAN approaches.

The paper tackles the problem of generic text removal in real scenes, proposing MTRNet which achieves state-of-the-art results on datasets like ICDAR 2013, ICDAR 2017 MLT, and CTW1500 without explicit training on them.

Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What's more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes