CVAIJun 9, 2025

NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation

arXiv:2506.07698v12 citationsh-index: 7ICME
Originality Incremental advance
AI Analysis

This addresses the challenge of creating consistent 3D content from single images for 3D AI-generated content creators, representing an incremental improvement over prior methods.

The paper tackles the problem of insufficient multi-view consistency in single-image-to-3D generation by introducing NOVA3D, which leverages 3D priors from a pretrained video diffusion model and integrates geometric information, achieving improved generalization and consistency over existing baselines.

3D AI-generated content (AIGC) has made it increasingly accessible for anyone to become a 3D content creator. While recent methods leverage Score Distillation Sampling to distill 3D objects from pretrained image diffusion models, they often suffer from inadequate 3D priors, leading to insufficient multi-view consistency. In this work, we introduce NOVA3D, an innovative single-image-to-3D generation framework. Our key insight lies in leveraging strong 3D priors from a pretrained video diffusion model and integrating geometric information during multi-view video fine-tuning. To facilitate information exchange between color and geometric domains, we propose the Geometry-Temporal Alignment (GTA) attention mechanism, thereby improving generalization and multi-view consistency. Moreover, we introduce the de-conflict geometry fusion algorithm, which improves texture fidelity by addressing multi-view inaccuracies and resolving discrepancies in pose alignment. Extensive experiments validate the superiority of NOVA3D over existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes