CVMar 27, 2024

UniDepth: Universal Monocular Metric Depth Estimation

arXiv:2403.18913v1405 citationsh-index: 28Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses the practical limitation of existing MMDE methods for applications in 3D perception and modeling by enabling cross-domain generalization, though it is an incremental improvement over prior work.

The paper tackles the problem of monocular metric depth estimation (MMDE) failing to generalize across domains by proposing UniDepth, a model that directly predicts metric 3D points from single images without additional information, achieving superior zero-shot performance on ten datasets compared to domain-specific methods.

Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains. Departing from the existing MMDE methods, UniDepth directly predicts metric 3D points from the input image at inference time without any additional information, striving for a universal and flexible MMDE solution. In particular, UniDepth implements a self-promptable camera module predicting dense camera representation to condition depth features. Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations. In addition, we propose a geometric invariance loss that promotes the invariance of camera-prompted depth features. Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth, even when compared with methods directly trained on the testing domains. Code and models are available at: https://github.com/lpiccinelli-eth/unidepth

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes