CVMay 7, 2025

Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting

arXiv:2505.04262v1h-index: 11Neural Networks
Originality Incremental advance
AI Analysis

This work addresses geometric flaws in text-to-3D generation for applications like 3D modeling and virtual reality, representing an incremental improvement over existing methods.

The paper tackled geometric inconsistencies in text-to-3D generation by proposing Coupled Score Distillation (CSD), which integrates multi-view diffusion priors and directly optimizes 3D Gaussian Splatting, resulting in improved geometric consistency and competitive quality in generated 3D assets.

Score Distillation Sampling (SDS) leverages pretrained 2D diffusion models to advance text-to-3D generation but neglects multi-view correlations, being prone to geometric inconsistencies and multi-face artifacts in the generated 3D content. In this work, we propose Coupled Score Distillation (CSD), a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation while enabling the stable and direct optimization of 3D Gaussian Splatting. Specifically, by reformulating the optimization as a multi-view joint optimization problem, we derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints while preserving the diversity of generated 3D assets. Additionally, we propose a framework that directly optimizes 3D Gaussian Splatting (3D-GS) with random initialization to generate geometrically consistent 3D content. We further employ a deformable tetrahedral grid, initialized from 3D-GS and refined through CSD, to produce high-quality, refined meshes. Quantitative and qualitative experimental results demonstrate the efficiency and competitive quality of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes