GR CVMay 16

VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

Haotian Mao, Yuhan Huang, Jiatao Lin, Yang Zhao, Hui Wang, Yiheng Zhang, Yuwang Wang, Chenliang Zhou, Yan Zhang, Fangcheng Zhong, Xubo Yang

arXiv:2605.1710266.2

Predicted impact top 40% in GR · last 90 daysOriginality Highly original

AI Analysis

This work addresses the problem of physically plausible 3D indoor scene arrangement for computer graphics and vision applications, offering a collision-free alternative to existing methods.

VoxScene introduces an anchor-conditioned voxel diffusion framework for 3D scene synthesis that uses explicit object-centric voxel representations to eliminate physical collisions and structural entanglement, achieving state-of-the-art physical plausibility and shape diversity.

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.

View on arXiv PDF

Similar