CVAIJul 11, 2025

Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates

arXiv:2507.08636v15 citationsh-index: 14ICDAR
Originality Incremental advance
AI Analysis

This addresses the problem of automating information extraction from historical handwritten documents for archivists and researchers, though it is incremental as it compares annotation strategies for an existing method.

This study evaluated the Document Attention Network (DAN) for extracting information from handwritten Uruguayan birth certificates, finding that normalized annotation worked better for standardized fields like dates (with concrete improvements), while diplomatic annotation was superior for non-standardized fields like names.

This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes