CVAug 23, 2025

Structural Energy-Guided Sampling for View-Consistent Text-to-3D

Qing Zhang, Jinguang Tong, Jie Hong, Jing Zhang, Xuesong Li

arXiv:2508.16917v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses a key issue in text-to-3D generation for applications like 3D modeling and visualization, though it is incremental as it builds on existing SDS/VSD pipelines.

The paper tackled the Janus problem in text-to-3D generation, where objects appear correct from the front but distorted from other angles, by proposing Structural Energy-Guided Sampling (SEGS), a training-free framework that enforces multi-view consistency at sampling time, significantly reducing artifacts and improving geometric alignment.

Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time. SEGS defines a structural energy in a PCA subspace of intermediate U-Net features and injects its gradients into the denoising trajectory, steering geometry toward the intended viewpoint while preserving appearance fidelity. Integrated seamlessly into SDS/VSD pipelines, SEGS significantly reduces Janus artifacts, achieving improved geometric alignment and viewpoint consistency without retraining or weight modification.

View on arXiv PDF

Similar