Deep learning for word-level handwritten Indic script identification
This work addresses script identification for Indic languages, which is an incremental improvement in a domain-specific task.
The paper tackles word-level handwritten Indic script identification by using CNNs with multilevel 2D discrete Haar wavelet transforms for feature extraction, achieving a maximum identification rate of 94.73% with an MLP, which outperforms state-of-the-art techniques.
We propose a novel method that uses convolutional neural networks (CNNs) for feature extraction. Not just limited to conventional spatial domain representation, we use multilevel 2D discrete Haar wavelet transform, where image representations are scaled to a variety of different sizes. These are then used to train different CNNs to select features. To be precise, we use 10 different CNNs that select a set of 10240 features, i.e. 1024/CNN. With this, 11 different handwritten scripts are identified, where 1K words per script are used. In our test, we have achieved the maximum script identification rate of 94.73% using multi-layer perceptron (MLP). Our results outperform the state-of-the-art techniques.