CVJun 6, 2018

NumtaDB - Assembled Bengali Handwritten Digits

arXiv:1806.02452v144 citations
Originality Synthesis-oriented
AI Analysis

This provides a benchmark dataset for Bengali digit recognition algorithms, addressing biases in existing data, though it is incremental as it focuses on data creation rather than new methods.

The authors tackled the lack of a large, unbiased public dataset for Bengali digit recognition by assembling NumtaDB, which contains over 85,000 images of handwritten Bengali digits, documenting its collection and curation process.

To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which is free from biases originating from geographical location, gender, and age. With this aim in mind, NumtaDB, a dataset consisting of more than 85,000 images of hand-written Bengali digits, has been assembled. This paper documents the collection and curation process of numerals along with the salient statistics of the dataset.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes