CVOct 6, 2025

ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement

arXiv:2510.04668v11 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in text-to-image generation for users needing precise multi-concept representation, but it is incremental as it builds on existing personalization methods.

The paper tackles the problem of concept mixing in multi-concept personalization for text-to-image diffusion models, where multiple learned concepts interfere in output images, and presents ConceptSplit, which achieves robust personalization by mitigating unintended concept interference through token-wise adaptation and attention disentanglement.

In recent years, multi-concept personalization for text-to-image (T2I) diffusion models to represent several subjects in an image has gained much more attention. The main challenge of this task is "concept mixing", where multiple learned concepts interfere or blend undesirably in the output image. To address this issue, in this paper, we present ConceptSplit, a novel framework to split the individual concepts through training and inference. Our framework comprises two key components. First, we introduce Token-wise Value Adaptation (ToVA), a merging-free training method that focuses exclusively on adapting the value projection in cross-attention. Based on our empirical analysis, we found that modifying the key projection, a common approach in existing methods, can disrupt the attention mechanism and lead to concept mixing. Second, we propose Latent Optimization for Disentangled Attention (LODA), which alleviates attention entanglement during inference by optimizing the input latent. Through extensive qualitative and quantitative experiments, we demonstrate that ConceptSplit achieves robust multi-concept personalization, mitigating unintended concept interference. Code is available at https://github.com/KU-VGI/ConceptSplit

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes