CVMar 13, 2013

Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents

arXiv:1303.3087v126 citations
Originality Synthesis-oriented
AI Analysis

This work addresses document digitization and text recognition for South Indian languages, which is an incremental improvement in a domain-specific area.

The paper tackles the problem of classifying handwritten and printed text at the word level in South Indian scripts using statistical texture features, achieving an average classification rate of 99.26% across datasets including Kannada, Telugu, Malayalam, and Hindi.

In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local entropy. These feature vectors are then used to classify words via k-NN classifier. We have validated the approach over several different datasets. Scripts like Kannada, Telugu, Malayalam and Hindi i.e., Devanagari are primarily employed where an average classification rate of 99.26% is achieved. In addition, to provide an extensibility of the approach, we address Roman script by using publicly available dataset and interesting results are reported.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes