CV AI LGMar 16, 2021

Digital Peter: Dataset, Competition and Handwriting Recognition Methods

Mark Potanin, Denis Dimitrov, Alex Shonenkov, Vladimir Bataev, Denis Karachev, Maxim Novopoltsev

arXiv:2103.09354v28.012 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a benchmark for researchers to train and compare handwriting recognition models on historical documents.

The paper introduces a new dataset of 9,694 images from Peter the Great's manuscripts for handwriting text recognition, along with a segmentation procedure and baseline methods, and describes an open competition based on this dataset.

This paper presents a new dataset of Peter the Great's manuscripts and describes a segmentation procedure that converts initial images of documents into the lines. The new dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9 694 images and text files corresponding to lines in historical documents. The open machine learning competition Digital Peter was held based on the considered dataset. The baseline solution for this competition as well as more advanced methods on handwritten text recognition are described in the article. Full dataset and all code are publicly available.

View on arXiv PDF Code

Similar