CVApr 26, 2024

Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

arXiv:2404.17419v18 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of improving 3D generation for applications in computer vision and graphics, but it is incremental as it builds on an existing model without fine-tuning.

The paper tackled the problem of 3D generation by using multiple image prompts instead of a single one, resulting in enhanced performance in multi-view and 3D object generation as shown by quantitative metrics and qualitative assessments.

Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes