CVNov 25, 2024

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

arXiv:2411.16157v324 citationsh-index: 13Has CodeCVPR
Originality Highly original
AI Analysis

This addresses the challenge of versatile multi-view generation for computer vision applications, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of generating multiple novel views from any single image for Novel View Synthesis (NVS) by introducing MVGenMaster, a diffusion model enhanced with 3D priors, which can generate up to 100 views with improved generalization and 3D consistency, as demonstrated through extensive evaluations on benchmarks.

We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks. MVGenMaster leverages 3D priors that are warped using metric depth and camera poses, significantly enhancing both generalization and 3D consistency in NVS. Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses with a single forward process. Additionally, we have developed a comprehensive large-scale multi-view image dataset called MvD-1M, comprising up to 1.6 million scenes, equipped with well-aligned metric depth to train MVGenMaster. Moreover, we present several training and model modifications to strengthen the model with scaled-up datasets. Extensive evaluations across in- and out-of-domain benchmarks demonstrate the effectiveness of our proposed method and data formulation. Models and codes will be released at https://github.com/ewrfcas/MVGenMaster/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes