Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
This addresses the challenge of 3D content creation from limited 2D inputs for applications in computer graphics and AI, but it is incremental as it builds on existing diffusion models like Stable Diffusion.
The authors tackled the problem of generating 3D-consistent multi-view images from a single input view, resulting in a model that excels in producing high-quality, consistent outputs while overcoming issues like texture degradation and geometric misalignment.
We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consistent multi-view images from a single image, overcoming common issues like texture degradation and geometric misalignment. Furthermore, we showcase the feasibility of training a ControlNet on Zero123++ for enhanced control over the generation process. The code is available at https://github.com/SUDO-AI-3D/zero123plus.