CVMay 18, 2024

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

arXiv:2405.11252v112 citationsh-index: 20Has Code
Originality Incremental advance
AI Analysis

This work addresses stability and quality issues in high-resolution text-to-3D generation, which is important for applications in graphics and AI, but it is incremental as it builds on existing methods like ISM and Stable Diffusion XL.

The authors tackled the problem of pseudo ground truth inconsistency in text-to-3D generation by proposing Trajectory Score Matching (TSM) to reduce accumulated errors from DDIM inversion, and they introduced pixel-by-pixel gradient clipping to stabilize 3D Gaussian splatting with Stable Diffusion XL, achieving significant improvements in visual quality and performance over state-of-the-art models.

In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversion process of DDIM to generate two paths from the same starting point for calculation. Since both paths start from the same starting point, TSM can reduce the accumulated error compared to ISM, thus alleviating the problem of pseudo ground truth inconsistency. TSM enhances the stability and consistency of the model's generated paths during the distillation process. We demonstrate this experimentally and further show that ISM is a special case of TSM. Furthermore, to optimize the current multi-stage optimization process from high-resolution text to 3D generation, we adopt Stable Diffusion XL for guidance. In response to the issues of abnormal replication and splitting caused by unstable gradients during the 3D Gaussian splatting process when using Stable Diffusion XL, we propose a pixel-by-pixel gradient clipping method. Extensive experiments show that our model significantly surpasses the state-of-the-art models in terms of visual quality and performance. Code: \url{https://github.com/xingy038/Dreamer-XL}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes