CVAIRODec 1, 2021

MonoScene: Monocular 3D Semantic Scene Completion

arXiv:2112.00726v2468 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the challenge of 3D semantic scene completion from 2D images, which is crucial for applications like autonomous driving and robotics, representing a novel approach in the field.

MonoScene tackles the problem of inferring dense 3D geometry and semantics from a single monocular RGB image, outperforming existing methods on all metrics and datasets while hallucinating plausible scenery beyond the camera's field of view.

MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we solve the complex problem of 2D to 3D scene reconstruction while jointly inferring its semantics. Our framework relies on successive 2D and 3D UNets bridged by a novel 2D-3D features projection inspiring from optics and introduces a 3D context relation prior to enforce spatio-semantic consistency. Along with architectural contributions, we introduce novel global scene and local frustums losses. Experiments show we outperform the literature on all metrics and datasets while hallucinating plausible scenery even beyond the camera field of view. Our code and trained models are available at https://github.com/cv-rits/MonoScene.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes