CVAIDec 3, 2025

MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models

arXiv:2512.04248v1h-index: 13
Originality Highly original
AI Analysis

This addresses the problem of generating controllable 3D indoor scenes for applications like virtual reality or design, representing a novel method for a known bottleneck in multi-view consistency.

The paper tackles controllable 3D indoor scene generation for novel view synthesis by introducing MVRoom, a pipeline that uses multi-view diffusion conditioned on coarse 3D layouts with a two-stage design and layout-aware attention, achieving high-fidelity results that outperform state-of-the-art methods both quantitatively and qualitatively.

We introduce MVRoom, a controllable novel view synthesis (NVS) pipeline for 3D indoor scenes that uses multi-view diffusion conditioned on a coarse 3D layout. MVRoom employs a two-stage design in which the 3D layout is used throughout to enforce multi-view consistency. The first stage employs novel representations to effectively bridge the 3D layout and consistent image-based condition signals for multi-view generation. The second stage performs image-conditioned multi-view generation, incorporating a layout-aware epipolar attention mechanism to enhance multi-view consistency during the diffusion process. Additionally, we introduce an iterative framework that generates 3D scenes with varying numbers of objects and scene complexities by recursively performing multi-view generation (MVRoom), supporting text-to-scene generation. Experimental results demonstrate that our approach achieves high-fidelity and controllable 3D scene generation for NVS, outperforming state-of-the-art baseline methods both quantitatively and qualitatively. Ablation studies further validate the effectiveness of key components within our generation pipeline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes