Improving Rotated Text Detection with Rotation Region Proposal Networks
This work addresses the need for robust scene-text understanding to combat misinformation and aid the visually impaired, though it is incremental as it builds on existing systems.
The authors tackled the problem of detecting rotated text in images by extending Facebook's Rosetta system with Rotation Region Proposal Networks (RRPN), resulting in a significant improvement in detection performance.
A significant number of images shared on social media platforms such as Facebook and Instagram contain text in various forms. It's increasingly becoming commonplace for bad actors to share misinformation, hate speech or other kinds of harmful content as text overlaid on images on such platforms. A scene-text understanding system should hence be able to handle text in various orientations that the adversary might use. Moreover, such a system can be incorporated into screen readers used to aid the visually impaired. In this work, we extend the scene-text extraction system at Facebook, Rosetta, to efficiently handle text in various orientations. Specifically, we incorporate the Rotation Region Proposal Networks (RRPN) in our text extraction pipeline and offer practical suggestions for building and deploying a model for detecting and recognizing text in arbitrary orientations efficiently. Experimental results show a significant improvement on detecting rotated text.