CVMar 23, 2020

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

arXiv:2003.10432v3342 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient and accurate 3D reconstruction for computer vision applications, offering a novel end-to-end approach that outperforms existing methods.

The paper tackles 3D scene reconstruction from posed RGB images by directly regressing a truncated signed distance function, eliminating intermediate depth maps, and achieves state-of-the-art performance on the Scannet dataset with significant quantitative and qualitative improvements over baselines.

We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images. Traditional approaches to 3D reconstruction rely on an intermediate representation of depth maps prior to estimating a full 3D model of a scene. We hypothesize that a direct regression to 3D is more effective. A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume using the camera intrinsics and extrinsics. After accumulation, a 3D CNN refines the accumulated features and predicts the TSDF values. Additionally, semantic segmentation of the 3D model is obtained without significant computation. This approach is evaluated on the Scannet dataset where we significantly outperform state-of-the-art baselines (deep multiview stereo followed by traditional TSDF fusion) both quantitatively and qualitatively. We compare our 3D semantic segmentation to prior methods that use a depth sensor since no previous work attempts the problem with only RGB input.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes