CVAIApr 3

NavCrafter: Exploring 3D Scenes from a Single Image

arXiv:2604.0282829.4h-index: 3
Predicted impact top 13% in CV · last 90 daysOriginality Highly original
AI Analysis

This addresses the problem of 3D scene exploration from limited 2D data for applications in computer vision and graphics, representing a novel method rather than incremental work.

The paper tackles the problem of creating flexible 3D scenes from a single image when direct 3D data acquisition is costly, introducing NavCrafter which synthesizes novel-view video sequences with camera controllability and achieves state-of-the-art novel-view synthesis under large viewpoint shifts while substantially improving 3D reconstruction fidelity.

Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impractical. We introduce NavCrafter, a novel framework that explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency. NavCrafter leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage. To enable controllable multi-view synthesis, we introduce a multi-stage camera control mechanism that conditions diffusion models with diverse trajectories via dual-branch camera injection and attention modulation. We further propose a collision-aware camera trajectory planner and an enhanced 3D Gaussian Splatting (3DGS) pipeline with depth-aligned supervision, structural regularization and refinement. Extensive experiments demonstrate that NavCrafter achieves state-of-the-art novel-view synthesis under large viewpoint shifts and substantially improves 3D reconstruction fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes