CVJul 25, 2023

CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer

arXiv:2307.13310v131 citationsh-index: 89
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting irregular text in images for computer vision applications, with incremental improvements over existing contour-based methods.

The paper tackles the problem of arbitrary-shaped scene text detection by proposing CT-Net, a framework using contour transformers for progressive contour regression, achieving F-measures of 86.1 at 11.2 FPS on CTW1500 and 87.8 at 10.1 FPS on Total-Text datasets.

Contour based scene text detection methods have rapidly developed recently, but still suffer from inaccurate frontend contour initialization, multi-stage error accumulation, or deficient local information aggregation. To tackle these limitations, we propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers. Specifically, we first employ a contour initialization module that generates coarse text contours without any post-processing. Then, we adopt contour refinement modules to adaptively refine text contours in an iterative manner, which are beneficial for context information capturing and progressive global contour deformation. Besides, we propose an adaptive training strategy to enable the contour transformers to learn more potential deformation paths, and introduce a re-score mechanism that can effectively suppress false positives. Extensive experiments are conducted on four challenging datasets, which demonstrate the accuracy and efficiency of our CT-Net over state-of-the-art methods. Particularly, CT-Net achieves F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text datasets, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes