CVCLJun 9, 2020

Tamil Vowel Recognition With Augmented MNIST-like Data Set

arXiv:2006.08367v2
AI Analysis

This work addresses Tamil OCR/handwriting recognition, but it is incremental as it applies an existing method to a new dataset.

The researchers tackled the problem of Tamil vowel recognition by creating an MNIST-like dataset and training a CNN, achieving 82% cross-validation accuracy and 70% top-1 accuracy on handwritten vowels.

We report generation of a MNIST [4] compatible data set [1] for Tamil vowels to enable building a classification DNN or other such ML/AI deep learning [2] models for Tamil OCR/Handwriting applications. We report the capability of the 60,000 grayscale, 28x28 pixel dataset to build a 92% accuracy (training) and 82% cross-validation 4-layer CNN, with 100,000+ parameters, in TensorFlow. We also report a top-1 classification accuracy of 70% and top-2 classification accuracy of 92% on handwritten vowels showing, for the same network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes