CV CL LGDec 17, 2019

Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network

Hazrat Ali, Ahsan Ullah, Talha Iqbal, Shahid Khattak

arXiv:1912.07943v130 citations

Originality Synthesis-oriented

AI Analysis

This addresses the lack of resources and methods for Urdu language processing, with applications in postal services, banking, and manuscript preservation, though it is incremental as it applies existing methods to a new dataset.

The paper tackles the problem of automatic recognition of Urdu handwritten digits and characters by introducing a new dataset from over 900 individuals and evaluating deep autoencoder and convolutional neural network frameworks, achieving accuracies up to 97% for digits and 86.5% for characters.

Automatic recognition of Urdu handwritten digits and characters, is a challenging task. It has applications in postal address reading, bank's cheque processing, and digitization and preservation of handwritten manuscripts from old ages. While there exists a significant work for automatic recognition of handwritten English characters and other major languages of the world, the work done for Urdu lan-guage is extremely insufficient. This paper has two goals. Firstly, we introduce a pioneer dataset for handwritten digits and characters of Urdu, containing samples from more than 900 individuals. Secondly, we report results for automatic recog-nition of handwritten digits and characters as achieved by using deep auto-encoder network and convolutional neural network. More specifically, we use a two-layer and a three-layer deep autoencoder network and convolutional neural network and evaluate the two frameworks in terms of recognition accuracy. The proposed framework of deep autoencoder can successfully recognize digits and characters with an accuracy of 97% for digits only, 81% for characters only and 82% for both digits and characters simultaneously. In comparison, the framework of convolutional neural network has accuracy of 96.7% for digits only, 86.5% for characters only and 82.7% for both digits and characters simultaneously. These frameworks can serve as baselines for future research on Urdu handwritten text.

View on arXiv PDF

Similar