CVLGMar 7, 2025

Fake It To Make It: Virtual Multiviews to Enhance Monocular Indoor Semantic Scene Completion

arXiv:2503.05086v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of depth and occlusion ambiguities in indoor 3D scene understanding for applications like robotics and AR/VR, presenting an incremental advance through a novel fusion method.

The paper tackles the problem of monocular indoor semantic scene completion (SSC) by using virtual multiview synthesis to enhance 3D reconstruction from a single RGB image, achieving improvements of up to 2.8% in Scene Completion and 4.9% in Semantic Scene Completion IoU scores on the NYUv2 dataset.

Monocular Indoor Semantic Scene Completion (SSC) aims to reconstruct a 3D semantic occupancy map from a single RGB image of an indoor scene, inferring spatial layout and object categories from 2D image cues. The challenge of this task arises from the depth, scale, and shape ambiguities that emerge when transforming a 2D image into 3D space, particularly within the complex and often heavily occluded environments of indoor scenes. Current SSC methods often struggle with these ambiguities, resulting in distorted or missing object representations. To overcome these limitations, we introduce an innovative approach that leverages novel view synthesis and multiview fusion. Specifically, we demonstrate how virtual cameras can be placed around the scene to emulate multiview inputs that enhance contextual scene information. We also introduce a Multiview Fusion Adaptor (MVFA) to effectively combine the multiview 3D scene predictions into a unified 3D semantic occupancy map. Finally, we identify and study the inherent limitation of generative techniques when applied to SSC, specifically the Novelty-Consistency tradeoff. Our system, GenFuSE, demonstrates IoU score improvements of up to 2.8% for Scene Completion and 4.9% for Semantic Scene Completion when integrated with existing SSC networks on the NYUv2 dataset. This work introduces GenFuSE as a standard framework for advancing monocular SSC with synthesized inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes