DVD: Discrete Voxel Diffusion for 3D Generation and Editing
This work provides a novel discrete diffusion approach for 3D voxel generation, offering interpretability and editing capabilities, though it is incremental as it adapts discrete diffusion to a specific domain.
DVD introduces a discrete diffusion framework for generating, assessing, and editing sparse voxels in 3D generative pipelines, achieving quality gains and enabling uncertainty estimation and editing without additional model evaluations.
We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations.