A New Approach for Texture based Script Identification At Block Level using Quad Tree Decomposition
This addresses script identification for OCR systems in multi-lingual regions like India, but it is incremental as it applies existing methods to a specific domain.
The paper tackled the problem of identifying handwritten scripts in multi-script documents in India by using Gabor wavelets and quad-tree decomposition, achieving a best accuracy of 96.86% with an MLP classifier.
A considerable amount of success has been achieved in developing monolingual OCR systems for Indic scripts. But in a country like India, where multi-script scenario is prevalent, identifying scripts beforehand becomes obligatory. In this paper, we present the significance of Gabor wavelets filters in extracting directional energy and entropy distributions for 11 official handwritten scripts namely, Bangla, Devanagari, Gujarati, Gurumukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu and Roman. The experimentation is conducted at block level based on a quad-tree decomposition approach and evaluated using six different well-known classifiers. Finally, the best identification accuracy of 96.86% has been achieved by Multi Layer Perceptron (MLP) classifier for 3-fold cross validation at level-2 decomposition. The results serve to establish the efficacy of the present approach to the classification of handwritten Indic scripts