CVGRLGDec 6, 2023

XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

U of Toronto
arXiv:2312.03806v2179 citationsh-index: 13CVPR
Originality Highly original
AI Analysis

This addresses the challenge of scalable 3D content creation for applications in gaming, simulation, and virtual reality, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of generating high-resolution 3D voxel grids efficiently, achieving generation of up to 1024^3 resolution voxels without test-time optimization and demonstrating applications in large outdoor scenes and tasks like text-to-3D.

We present XCube (abbreviated as $\mathcal{X}^3$), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes. Our model can generate millions of voxels with a finest effective resolution of up to $1024^3$ in a feed-forward fashion without time-consuming test-time optimization. To achieve this, we employ a hierarchical voxel latent diffusion model which generates progressively higher resolution grids in a coarse-to-fine manner using a custom framework built on the highly efficient VDB data structure. Apart from generating high-resolution objects, we demonstrate the effectiveness of XCube on large outdoor scenes at scales of 100m$\times$100m with a voxel size as small as 10cm. We observe clear qualitative and quantitative improvements over past approaches. In addition to unconditional generation, we show that our model can be used to solve a variety of tasks such as user-guided editing, scene completion from a single scan, and text-to-3D. The source code and more results can be found at https://research.nvidia.com/labs/toronto-ai/xcube/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes