CVOct 24, 2024

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

arXiv:2410.19115v3253 citationsh-index: 11CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of accurate 3D scene reconstruction from single images for computer vision applications, with strong generalizability but incremental improvements in supervision methods.

The paper tackles monocular 3D geometry estimation from open-domain images by introducing MoGe, which predicts affine-invariant 3D point maps and uses novel global and local supervision. The model significantly outperforms state-of-the-art methods on diverse unseen datasets across tasks like 3D point map, depth map, and camera field of view estimation.

We present MoGe, a powerful model for recovering 3D geometry from monocular open-domain images. Given a single image, our model directly predicts a 3D point map of the captured scene with an affine-invariant representation, which is agnostic to true global scale and shift. This new representation precludes ambiguous supervision in training and facilitate effective geometry learning. Furthermore, we propose a set of novel global and local geometry supervisions that empower the model to learn high-quality geometry. These include a robust, optimal, and efficient point cloud alignment solver for accurate global shape learning, and a multi-scale local geometry loss promoting precise local geometry supervision. We train our model on a large, mixed dataset and demonstrate its strong generalizability and high accuracy. In our comprehensive evaluation on diverse unseen datasets, our model significantly outperforms state-of-the-art methods across all tasks, including monocular estimation of 3D point map, depth map, and camera field of view. Code and models can be found on our project page.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes