CVMar 14

Sky2Ground: A Benchmark for Site Modeling under Varying Altitude

arXiv:2603.1374050.5h-index: 8
Predicted impact top 69% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses challenges in large-scale, multi-altitude 3D perception for applications like site modeling, though it is incremental as it builds on existing pose estimation and reconstruction methods.

The paper tackles the problem of camera localization and reconstruction under varying altitudes by introducing the Sky2Ground dataset and SkyNet model, which improves multi-view alignment by 9.6% on RRA@5 and 18.1% on RTA@5 compared to existing methods.

We introduce Sky2Ground, a three-view dataset designed for varying altitude camera localization, correspondence learning, and reconstruction. The dataset combines structured synthetic imagery with real, in-the-wild images, providing both controlled multi-view geometry and realistic scene noise. Each of the 51 sites contains thousands of satellite, aerial, and ground images spanning wide altitude ranges and nearly orthogonal viewing angles, enabling rigorous evaluation across global-to-local contexts. We benchmark state of the art pose estimation models, including MASt3R, DUSt3R, Map Anything, and VGGT, and observe that the use of satellite imagery often degrades performance, highlighting the challenges under large altitude variations. We also examine reconstruction methods, highlighting the challenges introduced by sparse geometric overlap, varying perspectives, and the use of real imagery, which often introduces noise and reduces rendering quality. To address some of these challenges, we propose SkyNet, a model which enhances cross-view consistency when incorporating satellite imagery with a curriculum-based training strategy to progressively incorporate more satellite views. SkyNet significantly strengthens multi-view alignment and outperforms existing methods by 9.6% on RRA@5 and 18.1% on RTA@5 in terms of absolute performance. Sky2Ground and SkyNet together establish a comprehensive testbed and baseline for advancing large-scale, multi-altitude 3D perception and generalizable camera localization. Code and models will be released publicly for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes