CVSep 6, 2022

A Masked Bounding-Box Selection Based ResNet Predictor for Text Rotation Prediction

arXiv:2209.09198v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for OCR systems by improving rotation prediction accuracy, though it appears incremental as it builds on existing deep learning methods with a novel focus mechanism.

The paper tackles the problem of text rotation prediction in OCR systems, which suffer from performance degradation due to background noise, by introducing a masked bounding-box selection method that incorporates bounding box information to focus on text regions, resulting in a large margin improvement in performance.

The existing Optical Character Recognition (OCR) systems are capable of recognizing images with horizontal texts. However, when the rotation of the texts increases, it becomes harder to recognizing these texts. The performance of the OCR systems decreases. Thus predicting the rotations of the texts and correcting the images are important. Previous work mainly uses traditional Computer Vision methods like Hough Transform and Deep Learning methods like Convolutional Neural Network. However, all of these methods are prone to background noises commonly existing in general images with texts. To tackle this problem, in this work, we introduce a new masked bounding-box selection method, that incorporating the bounding box information into the system. By training a ResNet predictor to focus on the bounding box as the region of interest (ROI), the predictor learns to overlook the background noises. Evaluations on the text rotation prediction tasks show that our method improves the performance by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes