CVMay 12, 2021

MT: Multi-Perspective Feature Learning Network for Scene Text Detection

arXiv:2105.05455v1
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate text detection in intelligent systems, representing an incremental improvement over existing methods.

The paper tackles the problem of detecting arbitrary-shaped scene text with high speed and accuracy by proposing MT, a multi-perspective feature learning network that uses a single binary mask for inference, achieving state-of-the-art results on four real-world datasets.

Text detection, the key technology for understanding scene text, has become an attractive research topic. For detecting various scene texts, researchers propose plenty of detectors with different advantages: detection-based models enjoy fast detection speed, and segmentation-based algorithms are not limited by text shapes. However, for most intelligent systems, the detector needs to detect arbitrary-shaped texts with high speed and accuracy simultaneously. Thus, in this study, we design an efficient pipeline named as MT, which can detect adhesive arbitrary-shaped texts with only a single binary mask in the inference stage. This paper presents the contributions on three aspects: (1) a light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy; (2) a multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately; (3) a multi-factor constraints IoU minimization loss is introduced for training the proposed model. The effectiveness of MT is evaluated on four real-world scene text datasets, and it surpasses all the state-of-the-art competitors to a large extent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes