Turbo3D: Ultra-fast Text-to-3D Generation
This addresses the need for faster 3D content creation in applications like gaming and VR, representing a strong incremental improvement in efficiency.
The paper tackles the problem of slow text-to-3D generation by introducing Turbo3D, which generates high-quality Gaussian splatting assets in under one second, outperforming previous baselines in speed and quality.
We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.