CVMar 18, 2025

Bolt3D: Generating 3D Scenes in Seconds

arXiv:2503.14445v252 citationsh-index: 36
Originality Incremental advance
AI Analysis

This enables fast 3D content creation for applications like gaming and VR, though it builds incrementally on existing 2D diffusion architectures.

The paper tackles the problem of slow 3D scene generation by introducing Bolt3D, a latent diffusion model that generates 3D scenes from images in under 7 seconds on a single GPU, reducing inference cost by up to 300 times compared to prior methods.

We present a latent diffusion model for fast feed-forward 3D scene generation. Given one or more images, our model Bolt3D directly samples a 3D scene representation in less than seven seconds on a single GPU. We achieve this by leveraging powerful and scalable existing 2D diffusion network architectures to produce consistent high-fidelity 3D scene representations. To train this model, we create a large-scale multiview-consistent dataset of 3D geometry and appearance by applying state-of-the-art dense 3D reconstruction techniques to existing multiview image datasets. Compared to prior multiview generative models that require per-scene optimization for 3D reconstruction, Bolt3D reduces the inference cost by a factor of up to 300 times.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes