Efficient Scene Text Detection with Textual Attention Tower
This work addresses the problem of computational efficiency in scene text detection for applications like image analysis, but it is incremental as it builds on existing methods.
The paper tackles efficient scene text detection by proposing a shallower network with a feature fusion mechanism and self-attention to reduce computational complexity and suppress false positives, achieving better or comparable performance on benchmarks like ICDAR 2013, ICDAR 2015, and MSRA-TD500 with fewer parameters and lower cost.
Scene text detection has received attention for years and achieved an impressive performance across various benchmarks. In this work, we propose an efficient and accurate approach to detect multioriented text in scene images. The proposed feature fusion mechanism allows us to use a shallower network to reduce the computational complexity. A self-attention mechanism is adopted to suppress false positive detections. Experiments on public benchmarks including ICDAR 2013, ICDAR 2015 and MSRA-TD500 show that our proposed approach can achieve better or comparable performances with fewer parameters and less computational cost.