Handwritten digit string recognition by combination of residual network and RNN-CTC
This work addresses the problem of recognizing sequences of handwritten digits, which is incremental as it combines existing methods (residual networks, RNN, CTC) for a specific domain.
The authors tackled handwritten digit string recognition by combining a residual network, RNN, and CTC into an end-to-end trainable model, achieving recognition rates of 89.75% on ORAND-CAR-A and 91.14% on ORAND-CAR-B.
Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we take advantage of the architectures mentioned above to create a new network for handwritten digit string recognition. First we design a residual network to extract features from input images, then we employ a RNN to model the contextual information within feature sequences and predict recognition results. At the top of this network, a standard CTC is applied to calculate the loss and yield the final results. These three parts compose an end-to-end trainable network. The proposed new architecture achieves the highest performances on ORAND-CAR-A and ORAND-CAR-B with recognition rates 89.75% and 91.14%, respectively. In addition, the experiments on a generated captcha dataset which has much longer string length show the potential of the proposed network to handle long strings.