CVMar 10, 2022

Real-time Scene Text Detection Based on Global Level and Word Level Features

Fuqiang Zhao, Jionghua Yu, Enjun Xing, Wenming Song, Xue Xu

arXiv:2203.05251v13.72 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the problem of real-time scene text detection for applications like image analysis and autonomous systems, presenting an incremental improvement over existing methods.

The paper tackles the challenge of detecting arbitrary shape text in natural scenes with high accuracy and efficiency, proposing GWNet, a framework that achieves state-of-the-art performance with F-measures ranging from 87.5% to 89.2% on four benchmark datasets.

It is an extremely challenging task to detect arbitrary shape text in natural scenes on high accuracy and efficiency. In this paper, we propose a scene text detection framework, namely GWNet, which mainly includes two modules: Global module and RCNN module. Specifically, Global module improves the adaptive performance of the DB (Differentiable Binarization) module by adding k submodule and shift submodule. Two submodules enhance the adaptability of amplifying factor k, accelerate the convergence of models and help to produce more accurate detection results. RCNN module fuses global-level and word-level features. The word-level label is generated by obtaining the minimum axis-aligned rectangle boxes of the shrunk polygon. In the inference period, GWNet only uses global-level features to output simple polygon detections. Experiments on four benchmark datasets, including the MSRA-TD500, Total-Text, ICDAR2015 and CTW-1500, demonstrate that our GWNet outperforms the state-of-the-art detectors. Specifically, with a backbone of ResNet-50, we achieve an F-measure of 88.6% on MSRA- TD500, 87.9% on Total-Text, 89.2% on ICDAR2015 and 87.5% on CTW-1500.

View on arXiv PDF

Similar