CVEMJan 22, 2021

HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

arXiv:2101.10862v2385 citations
AI Analysis

This provides a resource for improving handwritten text recognition models, particularly for historical data linking, though it is incremental as it focuses on database creation rather than novel method development.

The authors tackled the problem of transcription errors in personal names for historical data linking by constructing HANA, a large-scale database of over 3.3 million names and 1.1 million images, which improved transcription accuracy on Danish and US census data.

Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges, these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 3.3 million names. The database contain more than 105 thousand unique names with a total of more than 1.1 million images of personal names, which proves useful for transfer learning to other settings. We provide three examples hereof, obtaining significantly improved transcription accuracy on both Danish and US census data. In addition, we present benchmark results for deep learning models automatically transcribing the personal names from the scanned documents. Through making more challenging large-scale databases publicly available we hope to foster more sophisticated, accurate, and robust models for handwritten text recognition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes