CVMar 30, 2023

Consistent View Synthesis with Pose-Guided Diffusion Models

arXiv:2303.17598v1136 citationsh-index: 41
Originality Highly original
AI Analysis

This addresses the problem of limited camera motion and inconsistency in view synthesis for Virtual Reality applications, representing a strong specific gain in the domain.

The paper tackles novel view synthesis from a single image by proposing a pose-guided diffusion model to generate consistent long-term videos under significant camera movement, achieving effectiveness against state-of-the-art transformer-based and GAN-based approaches on synthetic and real-world datasets.

Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes