CVJun 5, 2024

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

arXiv:2406.03184v231 citations
Originality Incremental advance
AI Analysis

This addresses the issue of low-quality 3D reconstructions for applications in computer vision and graphics, though it is incremental as it builds on existing diffusion-based approaches.

The paper tackles the problem of data bias in single image-to-3D generation by introducing Ouroboros3D, a unified framework that integrates multi-view image generation and 3D reconstruction into a recursive diffusion process, resulting in improved geometric consistency and outperforming existing methods.

Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes