CVJul 25, 2021

Comprehensive Studies for Arbitrary-shape Scene Text Detection

arXiv:2107.11800v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses inconsistencies in benchmarking for researchers in scene text detection, though it is incremental as it focuses on analysis rather than introducing a new method.

The paper tackles the problem of unfair performance comparisons in arbitrary-shape scene text detection by proposing a unified framework to ensure consistent settings, which helps clarify the pros and cons of existing methods under fair conditions.

Numerous scene text detection methods have been proposed in recent years. Most of them declare they have achieved state-of-the-art performances. However, the performance comparison is unfair, due to lots of inconsistent settings (e.g., training data, backbone network, multi-scale feature fusion, evaluation protocols, etc.). These various settings would dissemble the pros and cons of the proposed core techniques. In this paper, we carefully examine and analyze the inconsistent settings, and propose a unified framework for the bottom-up based scene text detection methods. Under the unified framework, we ensure the consistent settings for non-core modules, and mainly investigate the representations of describing arbitrary-shape scene texts, e.g., regressing points on text contours, clustering pixels with predicted auxiliary information, grouping connected components with learned linkages, etc. With the comprehensive investigations and elaborate analyses, it not only cleans up the obstacle of understanding the performance differences between existing methods but also reveals the advantages and disadvantages of previous models under fair comparisons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes