MVDream: Multi-view Diffusion for 3D Generation
This work addresses the challenge of generating coherent 3D models from text for applications in computer graphics and AI, representing an incremental improvement by building on existing diffusion and 3D generation techniques.
The paper tackles the problem of generating consistent 3D content from text prompts by introducing MVDream, a multi-view diffusion model that learns from 2D and 3D data, achieving generalizability and consistency, and it significantly enhances the consistency and stability of existing 2D-lifting methods for 3D generation.
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.