CV AIDec 3, 2025

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

Shaoheng Fang, Chaohui Yu, Fan Wang, Qixing Huang

arXiv:2512.04248v13.6h-index: 13

Originality Highly original

AI Analysis

This addresses the problem of generating controllable 3D indoor scenes for applications like virtual reality or design, representing a novel method for a known bottleneck in multi-view consistency.

The paper tackles controllable 3D indoor scene generation for novel view synthesis by introducing MVRoom, a pipeline that uses multi-view diffusion conditioned on coarse 3D layouts with a two-stage design and layout-aware attention, achieving high-fidelity results that outperform state-of-the-art methods both quantitatively and qualitatively.

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively. Ablation studies further validate the effectiveness of key components within our generation pipeline.

View on arXiv PDF

Similar