Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
This addresses the costly annotation and error-prone segmentation in handwriting recognition for document digitization, representing an incremental improvement by extending existing MDLSTM-RNNs.
The paper tackles the problem of offline handwriting recognition by proposing an end-to-end model that jointly segments and transcribes handwritten paragraphs, eliminating the need for pre-segmented line images. The model achieves competitive results on Rimes and IAM databases compared to line-level trained networks.
Offline handwriting recognition systems require cropped text line images for both training and recognition. On the one hand, the annotation of position and transcript at line level is costly to obtain. On the other hand, automatic line segmentation algorithms are prone to errors, compromising the subsequent recognition. In this paper, we propose a modification of the popular and efficient multi-dimensional long short-term memory recurrent neural networks (MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More particularly, we replace the collapse layer transforming the two-dimensional representation into a sequence of predictions by a recurrent version which can recognize one line at a time. In the proposed model, a neural network performs a kind of implicit line segmentation by computing attention weights on the image representation. The experiments on paragraphs of Rimes and IAM database yield results that are competitive with those of networks trained at line level, and constitute a significant step towards end-to-end transcription of full documents.