CVJul 30, 2025

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, Zhuowen Tu

arXiv:2507.22825v114 citationsh-index: 8

Originality Highly original

AI Analysis

This addresses the problem of accurate 3D scene reconstruction from a single image for applications in robotics or AR/VR, representing a novel method for a known bottleneck.

The paper tackles single-view 3D scene reconstruction by proposing DepR, a framework that uses depth guidance and instance-level diffusion to generate and compose objects into a coherent layout, achieving state-of-the-art performance with strong generalization on synthetic and real-world datasets.

We propose DepR, a depth-guided single-view scene reconstruction framework that integrates instance-level diffusion within a compositional paradigm. Instead of reconstructing the entire scene holistically, DepR generates individual objects and subsequently composes them into a coherent 3D layout. Unlike previous methods that use depth solely for object layout estimation during inference and therefore fail to fully exploit its rich geometric information, DepR leverages depth throughout both training and inference. Specifically, we introduce depth-guided conditioning to effectively encode shape priors into diffusion models. During inference, depth further guides DDIM sampling and layout optimization, enhancing alignment between the reconstruction and the input image. Despite being trained on limited synthetic data, DepR achieves state-of-the-art performance and demonstrates strong generalization in single-view scene reconstruction, as shown through evaluations on both synthetic and real-world datasets.

View on arXiv PDF

Similar