CVApr 22, 2024

MultiBooth: Towards Generating All Your Concepts in an Image from Text

arXiv:2404.14239v353 citationsh-index: 15AAAI
Originality Incremental advance
AI Analysis

This addresses the challenge of generating images with multiple custom concepts from text for users in creative and AI-driven design, representing an incremental improvement over existing methods.

The paper tackles the problem of low concept fidelity and high inference cost in multi-concept image generation from text by introducing MultiBooth, which uses a two-phase approach with bounding boxes to achieve superior performance and computational efficiency, as demonstrated in qualitative and quantitative evaluations.

This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. Despite the significant advancements in customized generation methods, particularly with the success of diffusion models, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the creation of individual concepts within their specified regions, thereby facilitating the formation of multi-concept images. This strategy not only improves concept fidelity but also reduces additional inference cost. MultiBooth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency. Project Page: https://multibooth.github.io/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes