CV LGNov 21, 2018

A Novel Integrated Framework for Learning both Text Detection and Recognition

Wanchen Sui, Qing Zhang, Jun Yang, Wei Chu

arXiv:1811.08611v13.94 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of isolated training for text detection and recognition in computer vision, offering a more efficient solution for applications like document analysis and scene text understanding, though it is incremental as it builds on existing methods by integrating them.

The paper tackles the problem of text detection and recognition being treated as separate tasks by proposing an integrated end-to-end framework that shares parameters between detection and recognition models, resulting in improved accuracy and reduced computational load.

In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast to those methods, by sharing model parameters, we merge the detection model and recognition model into a single end-to-end trainable model and train the joint model for two tasks simultaneously. The shared parameters not only help effectively reduce the computational load in inference process, but also improve the end-to-end text detection-recognition accuracy. In addition, we design a simpler and faster sequence learning method for the recognition network based on a succession of stacked convolutional layers without any recurrent structure, this is proved feasible and dramatically improves inference speed. Extensive experiments on different datasets demonstrate that the proposed method achieves very promising results.

View on arXiv PDF

Similar