CVLGJun 2

Training-Free Multi-Concept LoRA Composition with Prompt-Aware Weighting

arXiv:2606.0379293.6h-index: 30Has Code
Predicted impact top 11% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners using LoRA for personalized image generation, this work offers a training-free method to combine multiple concepts without interference, though it is an incremental improvement over existing composition techniques.

This paper tackles multi-concept LoRA composition in text-to-image generation, proposing prompt-aware weighting methods (W-Switch and W-Composite) that combine LoRA outputs based on semantic importance. The approach achieves consistent improvements over state-of-the-art methods on the ComposLoRA testbed in visual quality, identity preservation, and compositionality.

Low-Rank Adaptation (LoRA) successfully enables personalization in text-to-image generation by adapting pre-trained diffusion models to specific visual concepts and styles. However, extending such models to multi-concept customization remains challenging. Naively combining multiple LoRA weights or their outputs often leads to interference among concepts, resulting in degraded visual quality and reduced fidelity to the reference images of individual concepts. This paper proposes a simple yet effective approach for multi-concept customization by optimally combining the outputs of multiple LoRA modules. We leverage the relative importance of each concept during generation, as inferred from its corresponding prompt tokens and introduce two methods, W-Switch and W-Composite, that employ a prompt-aware importance weighting strategy in which each LoRA is weighted according to the semantic influence of its trigger words in the target prompt. In addition, we extend existing quantitative evaluation metrics by proposing a new image-based similarity evaluation framework that assesses image fidelity and identity preservation through comparisons between real-world reference images and automatically segmented concept regions from generated images. We evaluate our approach on the ComposLoRA testbed and demonstrate consistent improvements over existing state-of-the-art methods in terms of visual quality, identity preservation and compositionality. Qualitative evaluations, including a Large Language Model (LLM) based assessment and a user study, further validate the effectiveness of the proposed methods and align with the newly introduced quantitative image-based metrics. Our code is available at https://github.com/GeorgeTsoumplekas/Prompt-Aware-Multi-LoRA-Composition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes