MTRNet++: One-stage Mask-based Scene Text Eraser
This provides a precise, controllable, and interpretable text removal method for user-specific and large-scale applications.
The paper tackles the problem of scene text removal by proposing MTRNet++, a one-stage mask-based text inpainting network that can remove text with or without an external mask, achieving state-of-the-art results on the Oxford and SCUT datasets.
A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.