CVDec 17, 2020

Learning to Recover 3D Scene Shape from a Single Image

arXiv:2012.09365v1316 citationsHas Code
AI Analysis

This work is significant for researchers and practitioners in computer vision and 3D reconstruction, as it provides a method to overcome limitations in monocular depth estimation for accurate 3D scene shape recovery, which is an incremental improvement.

This paper addresses the problem of recovering accurate 3D scene shape from a single image, which is hindered by unknown depth shifts and camera focal lengths in existing monocular depth estimation methods. The authors propose a two-stage framework that first predicts depth up to an unknown scale and shift, and then uses 3D point cloud encoders to predict the missing depth shift and focal length, achieving state-of-the-art performance on zero-shot dataset generalization across nine unseen datasets.

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes