CVJun 21, 2021

An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention

arXiv:2106.10875v11.4

Originality Incremental advance

AI Analysis

This work addresses OCR for the Khmer language, which is an incremental improvement over existing methods.

The paper tackled Khmer optical character recognition by proposing an end-to-end deep convolutional recurrent neural network with a sequence-to-sequence architecture and attention mechanism, achieving a character error rate of 1% compared to 3% for the state-of-the-art Tesseract OCR engine on a 3000-image test set.

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of residual convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network was trained on a large collection of computer-generated text-line images for seven common Khmer fonts. The proposed model's performance outperformed the state-of-art Tesseract OCR engine for Khmer language on the 3000-images test set by achieving a character error rate (CER) of 1% vs 3%.

View on arXiv PDF

Similar