CVNov 4, 2020

Handwriting Classification for the Analysis of Art-Historical Documents

arXiv:2011.02264v1
AI Analysis

This work addresses the need for automatic analysis in large digitized archives for art historians, but it is incremental as it introduces a new step in an existing handwriting OCR pipeline.

The paper tackles the problem of analyzing handwriting in scanned art-historical documents by proposing a handwriting classification model to label text fragments like numbers or dates based on visual structure, enabling historians to highlight documents with specific text classes without full reading, and it develops and compares deep learning models with experiments on a real-world dataset.

Digitized archives contain and preserve the knowledge of generations of scholars in millions of documents. The size of these archives calls for automatic analysis since a manual analysis by specialists is often too expensive. In this paper, we focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI. Since the archive consists of documents written in several languages and lacks annotated training data for the creation of recognition models, we propose the task of handwriting classification as a new step for a handwriting OCR pipeline. We propose a handwriting classification model that labels extracted text fragments, eg, numbers, dates, or words, based on their visual structure. Such a classification supports historians by highlighting documents that contain a specific class of text without the need to read the entire content. To this end, we develop and compare several deep learning-based models for text classification. In extensive experiments, we show the advantages and disadvantages of our proposed approach and discuss possible usage scenarios on a real-world dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes