CVSep 28, 2025

RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization

Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha

arXiv:2509.23991v111.83 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of limited labeled data for 360 depth estimation, offering a training-free solution that enhances applications in domains like VR and robotics, though it is incremental as it builds on existing foundation models.

The paper tackles the problem of robust depth estimation for 360 images without training, by leveraging perspective foundation models and graph optimization to align depth scales across cubemap faces, achieving superior performance on datasets like Matterport3D and improving downstream tasks such as feature matching by 3.2-5.4% and Structure from Motion by 0.2-9.7% in AUC@5.

The increasing use of 360 images across various domains has emphasized the need for robust depth estimation techniques tailored for omnidirectional images. However, obtaining large-scale labeled datasets for 360 depth estimation remains a significant challenge. In this paper, we propose RPG360, a training-free robust 360 monocular depth estimation method that leverages perspective foundation models and graph optimization. Our approach converts 360 images into six-face cubemap representations, where a perspective foundation model is employed to estimate depth and surface normals. To address depth scale inconsistencies across different faces of the cubemap, we introduce a novel depth scale alignment technique using graph-based optimization, which parameterizes the predicted depth and normal maps while incorporating an additional per-face scale parameter. This optimization ensures depth scale consistency across the six-face cubemap while preserving 3D structural integrity. Furthermore, as foundation models exhibit inherent robustness in zero-shot settings, our method achieves superior performance across diverse datasets, including Matterport3D, Stanford2D3D, and 360Loc. We also demonstrate the versatility of our depth estimation approach by validating its benefits in downstream tasks such as feature matching 3.2 ~ 5.4% and Structure from Motion 0.2 ~ 9.7% in AUC@5.

View on arXiv PDF

Similar