CVJul 7, 2020

HKR For Handwritten Kazakh & Russian Database

Daniyar Nurseitov, Kairat Bostanbekov, Daniyar Kurmankhojayev, Anel Alimova, Abdelrahman Abdallah

arXiv:2007.03579v211.139 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a resource for researchers in handwriting recognition, but it is incremental as it focuses on new data for existing methods.

The authors tackled the lack of a dedicated offline handwriting recognition dataset for Russian and Kazakh languages by creating a new database with over 1400 forms, 63,000 sentences, and 715,699 symbols from 200 writers, which includes 95% Russian and 5% Kazakh content.

In this paper, we present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition. A few pre-processing and segmentation procedures have been developed together with the database. The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by \LaTeX which subsequently was filled out by persons with their handwriting. The database consists of more than 1400 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 different writers. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.

View on arXiv PDF Code

Similar