A Modular Framework for Single-View 3D Reconstruction of Indoor Environments
This addresses the problem of reconstructing 3D indoor environments from single images for applications like interior design and augmented reality, representing a strong specific gain rather than a foundational breakthrough.
The paper tackles single-view 3D reconstruction of indoor scenes by proposing a modular framework that splits the process into two steps: using diffusion techniques to predict complete views of room backgrounds and occluded instances, then transforming them into 3D. Experiments on the 3D-Front dataset show it outperforms state-of-the-art methods in visual quality and reconstruction accuracy.
We propose a modular framework for single-view indoor scene 3D reconstruction, where several core modules are powered by diffusion techniques. Traditional approaches for this task often struggle with the complex instance shapes and occlusions inherent in indoor environments. They frequently overshoot by attempting to predict 3D shapes directly from incomplete 2D images, which results in limited reconstruction quality. We aim to overcome this limitation by splitting the process into two steps: first, we employ diffusion-based techniques to predict the complete views of the room background and occluded indoor instances, then transform them into 3D. Our modular framework makes contributions to this field through the following components: an amodal completion module for restoring the full view of occluded instances, an inpainting model specifically trained to predict room layouts, a hybrid depth estimation technique that balances overall geometric accuracy with fine detail expressiveness, and a view-space alignment method that exploits both 2D and 3D cues to ensure precise placement of instances within the scene. This approach effectively reconstructs both foreground instances and the room background from a single image. Extensive experiments on the 3D-Front dataset demonstrate that our method outperforms current state-of-the-art (SOTA) approaches in terms of both visual quality and reconstruction accuracy. The framework holds promising potential for applications in interior design, real estate, and augmented reality.