Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
This addresses the challenge of generating consistent novel views from a single image for applications in computer vision and graphics, representing an incremental improvement by integrating geometric priars into diffusion models.
The paper tackled the problem of single-image novel view synthesis by introducing PointmapDiffusion, a framework that uses pointmaps to condition pre-trained 2D diffusion models, resulting in high-quality, multi-view consistent outputs with significantly fewer trainable parameters than baselines.
In this paper, we present PointmapDiffusion, a novel framework for single-image novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates) as a conditioning signal, capturing geometric prior from the reference images to guide the diffusion process. By embedding reference attention blocks and a ControlNet for pointmap features, our model balances between generative capability and geometric consistency, enabling accurate view synthesis across varying viewpoints. Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion achieves high-quality, multi-view consistent results with significantly fewer trainable parameters compared to other baselines for single-image NVS tasks.