ROAISep 9, 2025

Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation

arXiv:2509.08159v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the need for efficient collision avoidance in autonomous drones by reducing reliance on heavy sensors or data-intensive fine-tuning, though it is incremental as it builds on existing visual-inertial and depth estimation techniques.

The paper tackles the problem of predicting metric depth from monocular RGB images and an IMU for autonomous aerial navigation, proposing lightweight zero-shot rescaling strategies to convert relative depth estimates into metric depth, with the best method achieving on-board estimates at 15 Hz and enabling successful collision avoidance in real-world tests.

This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor. We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes