RRPN++: Guidance Towards More Accurate Scene Text Detection
This is an incremental improvement for scene text detection systems, enhancing accuracy and speed in applications like document analysis and image understanding.
The paper tackles the problem of inaccurate scene text detection in RRPN by introducing RRPN++ with anchor-free proposal generation and multi-task learning incorporating recognition, resulting in a 6% F-measure boost on ICDAR2015 and improved efficiency.
RRPN is among the outstanding scene text detection approaches, but the manually-designed anchor and coarse proposal refinement make the performance still far from perfection. In this paper, we propose RRPN++ to exploit the potential of RRPN-based model by several improvements. Based on RRPN, we propose the Anchor-free Pyramid Proposal Networks (APPN) to generate first-stage proposals, which adopts the anchor-free design to reduce proposal number and accelerate the inference speed. In our second stage, both the detection branch and the recognition branch are incorporated to perform multi-task learning. In inference stage, the detection branch outputs the proposal refinement and the recognition branch predicts the transcript of the refined text region. Further, the recognition branch also helps rescore the proposals and eliminate the false positive proposals by the jointing filtering strategy. With these enhancements, we boost the detection results by $6\%$ of F-measure in ICDAR2015 compared to RRPN. Experiments conducted on other benchmarks also illustrate the superior performance and efficiency of our model.