CVApr 1, 2025

Archival Faces: Detection of Faces in Digitized Historical Documents

arXiv:2504.00558v2h-index: 1ICDAR
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better face detection in historical archives to enhance searchability, but it is incremental as it focuses on dataset creation rather than a novel detection method.

The paper tackles the problem of low face detection accuracy in digitized historical documents, where existing tools achieve only 24% mAP, by introducing a new manually annotated dataset of 2.2k images with 11k annotations, which improves detection results when used to retrain models.

When digitizing historical archives, it is necessary to search for the faces of celebrities and ordinary people, especially in newspapers, link them to the surrounding text, and make them searchable. Existing face detectors on datasets of scanned historical documents fail remarkably -- current detection tools only achieve around 24% mAP at 50:90% IoU. This work compensates for this failure by introducing a new manually annotated domain-specific dataset in the style of the popular Wider Face dataset, containing 2.2k new images from digitized historical newspapers from the 19th to 20th century, with 11k new bounding-box annotations and associated facial landmarks. This dataset allows existing detectors to be retrained to bring their results closer to the standard in the field of face detection in the wild. We report several experimental results comparing different families of fine-tuned detectors against publicly available pre-trained face detectors and ablation studies of multiple detector sizes with comprehensive detection and landmark prediction performance results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes