CVFeb 10

GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery

arXiv:2602.09932v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses the scarcity of accurate 3D urban data for applications like climate modeling and urban planning, offering an open-source solution with strong cross-city generalization, though it is incremental as it builds on existing transformer methods.

The paper tackles the problem of estimating building height and footprint from satellite imagery by proposing GeoFormer, a Swin Transformer-based framework that uses Sentinel-1/2 imagery and open DEM data, achieving a building height RMSE of 3.19 m and footprint RMSE of 0.05, with improvements of 7.5% and 15.3% over CNN baselines.

Accurate three-dimensional urban data are critical for climate modelling, disaster risk assessment, and urban planning, yet remain scarce due to reliance on proprietary sensors or poor cross-city generalisation. We propose GeoFormer, an open-source Swin Transformer framework that jointly estimates building height (BH) and footprint (BF) on a 100 m grid using only Sentinel-1/2 imagery and open DEM data. A geo-blocked splitting strategy ensures strict spatial independence between training and test sets. Evaluated over 54 diverse cities, GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, improving 7.5% and 15.3% over the strongest CNN baseline, while maintaining under 3.5 m BH RMSE in cross-continent transfer. Ablation studies confirm that DEM is indispensable for height estimation and that optical reflectance dominates over SAR, though multi-source fusion yields the best overall accuracy. All code, weights, and global products are publicly released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes