Chargrid-OCR: End-to-end Trainable Optical Character Recognition for Printed Documents using Instance Segmentation
This addresses OCR for printed documents, offering improved accuracy and speed, but it is incremental as it builds on existing segmentation and detection methods.
The paper tackles Optical Character Recognition (OCR) for printed documents by proposing an end-to-end trainable model that predicts a character grid and boxes, achieving state-of-the-art accuracy and faster performance on GPU.
We present an end-to-end trainable approach for Optical Character Recognition (OCR) on printed documents. Specifically, we propose a model that predicts a) a two-dimensional character grid (\emph{chargrid}) representation of a document image as a semantic segmentation task and b) character boxes for delineating character instances as an object detection task. For training the model, we build two large-scale datasets without resorting to any manual annotation - synthetic documents with clean labels and real documents with noisy labels. We demonstrate experimentally that our method, trained on the combination of these datasets, (i) outperforms previous state-of-the-art approaches in accuracy (ii) is easily parallelizable on GPU and is, therefore, significantly faster and (iii) is easy to train and adapt to a new domain.