Mallikarjun Hangarge

2papers

2 Papers

CVMar 13, 2013

Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents

Mallikarjun Hangarge, K. C. Santosh, Srikanth Doddamani et al.

In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local entropy. These feature vectors are then used to classify words via k-NN classifier. We have validated the approach over several different datasets. Scripts like Kannada, Telugu, Malayalam and Hindi i.e., Devanagari are primarily employed where an average classification rate of 99.26% is achieved. In addition, to provide an extensibility of the approach, we address Roman script by using publicly available dataset and interesting results are reported.

CVMar 12, 2013

Gaussian Mixture Model for Handwritten Script Identification

Mallikarjun Hangarge

This paper presents a Gaussian Mixture Model (GMM) to identify the script of handwritten words of Roman, Devanagari, Kannada and Telugu scripts. It emphasizes the significance of directional energies for identification of script of the word. It is robust to varied image sizes and different styles of writing. A GMM is modeled using a set of six novel features derived from directional energy distributions of the underlying image. The standard deviation of directional energy distributions are computed by decomposing an image matrix into right and left diagonals. Furthermore, deviation of horizontal and vertical distributions of energies is also built-in to GMM. A dataset of 400 images out of 800 (200 of each script) are used for training GMM and the remaining is for testing. An exhaustive experimentation is carried out at bi-script, tri-script and multi-script level and achieved script identification accuracies in percentage as 98.7, 98.16 and 96.91 respectively.