CVSep 25, 2024

Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

arXiv:2409.16820v112 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of accurately detecting irregular and overlapping text in scene images for applications like document analysis and autonomous systems, representing an incremental improvement over existing segmentation-based and shrink-based methods.

The paper tackles the challenge of detecting irregularly shaped scene text by proposing a Spotlight Text Detector (STD) that uses a spotlight calibration module to focus on candidate regions and a multivariate information extraction module to handle diverse text geometries, achieving state-of-the-art performance on multiple benchmark datasets including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes