CVJul 22, 2025

Synthetic Data Matters: Re-training with Geo-typical Synthetic Labels for Building Detection

arXiv:2507.16657v13 citationsh-index: 8IEEE Trans Geosci Remote Sens
Originality Incremental advance
AI Analysis

This addresses the challenge of limited annotated data for remote sensing building detection, offering a scalable solution to improve model generalization without extensive real-world annotations.

The paper tackles the problem of poor generalization in building segmentation models across diverse geographic regions by proposing a method to re-train models at test time using geo-typical synthetic data tailored to target areas, resulting in median performance improvements of up to 12%.

Deep learning has significantly advanced building segmentation in remote sensing, yet models struggle to generalize on data of diverse geographic regions due to variations in city layouts and the distribution of building types, sizes and locations. However, the amount of time-consuming annotated data for capturing worldwide diversity may never catch up with the demands of increasingly data-hungry models. Thus, we propose a novel approach: re-training models at test time using synthetic data tailored to the target region's city layout. This method generates geo-typical synthetic data that closely replicates the urban structure of a target area by leveraging geospatial data such as street network from OpenStreetMap. Using procedural modeling and physics-based rendering, very high-resolution synthetic images are created, incorporating domain randomization in building shapes, materials, and environmental illumination. This enables the generation of virtually unlimited training samples that maintain the essential characteristics of the target environment. To overcome synthetic-to-real domain gaps, our approach integrates geo-typical data into an adversarial domain adaptation framework for building segmentation. Experiments demonstrate significant performance enhancements, with median improvements of up to 12%, depending on the domain gap. This scalable and cost-effective method blends partial geographic knowledge with synthetic imagery, providing a promising solution to the "model collapse" issue in purely synthetic datasets. It offers a practical pathway to improving generalization in remote sensing building segmentation without extensive real-world annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes