CVFeb 15, 2020

Scale-Invariant Multi-Oriented Text Detection in Wild Scene Images

arXiv:2002.06423v1
Originality Highly original
AI Analysis

This addresses the challenge of detecting text in complex, real-world images for applications like document analysis and scene understanding, representing an incremental advance with specific performance gains.

The paper tackles the problem of detecting multi-oriented text in wild scene images by proposing a fully convolutional neural network with a novel Feature Representation Block, achieving significant improvements in state-of-the-art results on benchmark datasets like ICDAR 2015 and COCO-Text.

Automatic detection of scene texts in the wild is a challenging problem, particularly due to the difficulties in handling (i) occlusions of varying percentages, (ii) widely different scales and orientations, (iii) severe degradations in the image quality etc. In this article, we propose a fully convolutional neural network architecture consisting of a novel Feature Representation Block (FRB) capable of efficient abstraction of information. The proposed network has been trained using curriculum learning with respect to difficulties in image samples and gradual pixel-wise blurring. It is capable of detecting texts of different scales and orientations suffered by blurring from multiple possible sources, non-uniform illumination as well as partial occlusions of varying percentages. Text detection performance of the proposed framework on various benchmark sample databases including ICDAR 2015, ICDAR 2017 MLT, COCO-Text and MSRA-TD500 improves respective state-of-the-art results significantly. Source code of the proposed architecture will be made available at github.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes