CVJun 10, 2021

Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition

Ryota Yoshihashi, Tomohiro Tanaka, Kenji Doi, Takumi Fujino, Naoaki Yamashita

arXiv:2106.05611v11.4

Originality Incremental advance

AI Analysis

This work addresses the need for lightweight, real-time text spotting on mobile platforms, offering a practical solution for building stand-alone OCR applications, though it is incremental in simplifying existing methods.

The paper tackled the problem of heavy computation in end-to-end text spotting for mobile deployment by proposing Context-Free TextSpotter, which uses simple convolutions and minimal post-processing, achieving real-time performance on a GPU with only three million parameters and acceptable quality degradation. It demonstrated the method's ability to run on a smartphone with affordable latency, making it suitable for stand-alone OCR applications.

In the deployment of scene-text spotting systems on mobile platforms, lightweight models with low computation are preferable. In concept, end-to-end (E2E) text spotting is suitable for such purposes because it performs text detection and recognition in a single model. However, current state-of-the-art E2E methods rely on heavy feature extractors, recurrent sequence modellings, and complex shape aligners to pursue accuracy, which means their computations are still heavy. We explore the opposite direction: How far can we go without bells and whistles in E2E text spotting? To this end, we propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter. Experiments using standard benchmarks show that Context-Free TextSpotter achieves real-time text spotting on a GPU with only three million parameters, which is the smallest and fastest among existing deep text spotters, with an acceptable transcription quality degradation compared to heavier ones. Further, we demonstrate that our text spotter can run on a smartphone with affordable latency, which is valuable for building stand-alone OCR applications.

View on arXiv PDF

Similar