CVLGMar 28, 2024

Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation

arXiv:2403.19776v27 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses a specific problem in image generation for users needing to combine multiple personalized concepts, representing an incremental improvement over prior LoRA-based techniques.

The paper tackles the challenge of generating images that combine multiple concepts using pre-trained LoRA models, where existing methods often fail to include all concepts or produce incorrect combinations. The proposed CLoRA method updates attention maps at test-time to fuse latent representations, resulting in significantly better performance in multi-concept image generation compared to existing methods.

Low-Rank Adaptation (LoRA) has emerged as a powerful and popular technique for personalization, enabling efficient adaptation of pre-trained image generation models for specific tasks without comprehensive retraining. While employing individual pre-trained LoRA models excels at representing single concepts, such as those representing a specific dog or a cat, utilizing multiple LoRA models to capture a variety of concepts in a single image still poses a significant challenge. Existing methods often fall short, primarily because the attention mechanisms within different LoRA models overlap, leading to scenarios where one concept may be completely ignored (e.g., omitting the dog) or where concepts are incorrectly combined (e.g., producing an image of two cats instead of one cat and one dog). We introduce CLoRA, a training-free approach that addresses these limitations by updating the attention maps of multiple LoRA models at test-time, and leveraging the attention maps to create semantic masks for fusing latent representations. This enables the generation of composite images that accurately reflect the characteristics of each LoRA. Our comprehensive qualitative and quantitative evaluations demonstrate that CLoRA significantly outperforms existing methods in multi-concept image generation using LoRAs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes