R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
This addresses the challenge of robust text detection in real-world images for computer vision applications, but it is incremental as it builds on existing Faster R-CNN architecture.
The paper tackles the problem of detecting arbitrarily oriented text in natural scene images by proposing R2CNN, a method based on Faster R-CNN that uses rotational region proposals and inclined bounding boxes, achieving competitive results on ICDAR 2015 and ICDAR 2013 benchmarks.
In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.