CV ROAug 3, 2025

CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes

Yaxuan Li, Yewei Huang, Bijay Gaudel, Hamidreza Jafarnejadsani, Brendan Englot

arXiv:2508.01936v18.42 citationsh-index: 10IROS

Originality Incremental advance

AI Analysis

This addresses the challenge of sparse localization for real-world robotic applications like aerial navigation, but it is incremental as it builds on existing structure-from-motion and deep learning methods.

The paper tackles the problem of robust and accurate camera pose estimation across varied altitudes using sparse image input, and the result is a system that achieves superior performance in accuracy and robustness compared to existing solutions, as validated on newly introduced datasets.

We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes when only considering sparse image input. The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion into a unified framework. To benchmark our method and foster further research, we introduce two newly collected datasets specifically tailored for multi-altitude camera pose estimation; datasets of this nature remain rare in the current literature. The proposed framework has been validated through extensive comparative analyses on these datasets, demonstrating that our system achieves superior performance in both accuracy and robustness for multi-altitude sparse pose estimation tasks compared to existing solutions, making it well suited for real-world robotic applications such as aerial navigation, search and rescue, and automated inspection.

View on arXiv PDF

Similar