CVMay 16, 2024

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

arXiv:2405.10314v1415 citationsh-index: 36NIPS
Originality Incremental advance
AI Analysis

This addresses the need for efficient 3D scene creation for users in fields like graphics and VR, though it builds incrementally on diffusion models and 3D reconstruction techniques.

The paper tackles the problem of creating 3D scenes from limited input images by using a multi-view diffusion model to generate consistent novel views, enabling real-time 3D reconstruction in as little as one minute and outperforming existing methods.

Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes