ATT3D: Amortized Text-to-3D Object Synthesis
This addresses a computational bottleneck for researchers and practitioners in 3D content creation, though it is incremental as it builds on existing methods like DreamFusion.
The paper tackles the problem of slow per-prompt optimization in text-to-3D synthesis by amortizing training over multiple prompts, achieving faster generation and enabling generalization to unseen prompts and smooth interpolations.
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.