CV LGApr 29, 2020

MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters

Jannatul Ferdous, Suvrajit Karmaker, A K M Shahariar Azad Rabby, Syed Akhter Hossain

arXiv:2005.02155v22.37 citations

Originality Synthesis-oriented

AI Analysis

This addresses a data bottleneck for researchers working on Bangla OCR and handwriting recognition, though it is incremental as it primarily provides a new dataset rather than a novel method.

The authors tackled the lack of a comprehensive dataset for Bangla handwritten compound characters by creating MatriVasha, which contains 120 compound characters with 2,552 isolated samples collected from diverse writers across Bangladesh. This dataset is currently the most extensive available for this purpose.

At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.

View on arXiv PDF

Similar