CVMar 13, 2013

Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents

Mallikarjun Hangarge, K. C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi

arXiv:1303.3087v126 citations

Originality Synthesis-oriented

AI Analysis

This work addresses document digitization and text recognition for South Indian languages, which is an incremental improvement in a domain-specific area.

The paper tackles the problem of classifying handwritten and printed text at the word level in South Indian scripts using statistical texture features, achieving an average classification rate of 99.26% across datasets including Kannada, Telugu, Malayalam, and Hindi.

In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local entropy. These feature vectors are then used to classify words via k-NN classifier. We have validated the approach over several different datasets. Scripts like Kannada, Telugu, Malayalam and Hindi i.e., Devanagari are primarily employed where an average classification rate of 99.26% is achieved. In addition, to provide an extensibility of the approach, we address Roman script by using publicly available dataset and interesting results are reported.

View on arXiv PDF

Similar