NumtaDB - Assembled Bengali Handwritten Digits
This provides a benchmark dataset for Bengali digit recognition algorithms, addressing biases in existing data, though it is incremental as it focuses on data creation rather than new methods.
The authors tackled the lack of a large, unbiased public dataset for Bengali digit recognition by assembling NumtaDB, which contains over 85,000 images of handwritten Bengali digits, documenting its collection and curation process.
To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which is free from biases originating from geographical location, gender, and age. With this aim in mind, NumtaDB, a dataset consisting of more than 85,000 images of hand-written Bengali digits, has been assembled. This paper documents the collection and curation process of numerals along with the salient statistics of the dataset.