CVDLAug 16, 2022

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

arXiv:2208.07682v120 citationsh-index: 66
Originality Synthesis-oriented
AI Analysis

This provides a specialized dataset for researchers in HTR, particularly for historical documents, but it is incremental as it adds to existing benchmarks without solving core HTR challenges.

The authors introduced the LAM dataset, a new benchmark for line-level handwritten text recognition (HTR) focusing on Italian ancient manuscripts by a single author over 60 years, and they evaluated state-of-the-art HTR architectures on it, showing performance metrics for both basic and date-based configurations.

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting -- even of the same author over a wide time-span -- and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. The dataset is available for download at \url{https://aimagelab.ing.unimore.it/go/lam}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes