CVNov 19, 2025

Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation

arXiv:2511.15875v1h-index: 22
Originality Incremental advance
AI Analysis

This addresses the data scarcity issue for researchers and practitioners in historical document analysis, though it is incremental as it builds on existing synthetic data and domain adaptation methods.

The paper tackles the problem of scarce annotated training data for historical map segmentation by bootstrapping synthetic maps with realistic style and noise, achieving domain-adaptive semantic segmentation with a Self-Constructing Graph Convolutional Network.

The automated analysis of historical documents, particularly maps, has drastically benefited from advances in deep learning and its success across various computer vision applications. However, most deep learning-based methods heavily rely on large amounts of annotated training data, which are typically unavailable for historical maps, especially for those belonging to specific, homogeneous cartographic domains, also known as corpora. Creating high-quality training data suitable for machine learning often takes a significant amount of time and involves extensive manual effort. While synthetic training data can alleviate the scarcity of real-world samples, it often lacks the affinity (realism) and diversity (variation) necessary for effective learning. By transferring the cartographic style of an original historical map corpus onto vector data, we bootstrap an effectively unlimited number of synthetic historical maps suitable for tasks such as land-cover interpretation of a homogeneous historical map corpus. We propose an automatic deep generative approach and a alternative manual stochastic degradation technique to emulate the visual uncertainty and noise, also known as data-dependent uncertainty, commonly observed in historical map scans. To quantitatively evaluate the effectiveness and applicability of our approach, the generated training datasets were employed for domain-adaptive semantic segmentation on a homogeneous map corpus using a Self-Constructing Graph Convolutional Network, enabling a comprehensive assessment of the impact of our data bootstrapping methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes