ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation
This addresses the problem of customizable image synthesis for applications like advertising and virtual dressing, representing a strong incremental advancement over existing methods.
The paper tackles the challenge of maintaining identity preservation and prompt alignment in both single-concept and condition-free multi-concept image generation, introducing ShowFlow-S and ShowFlow-M frameworks that achieve state-of-the-art performance with significant improvements in metrics like CLIP-I and CLIP-T scores.
Customizing image generation remains a core challenge in controllable image synthesis. For single-concept generation, maintaining both identity preservation and prompt alignment is challenging. In multi-concept scenarios, relying solely on a prompt without additional conditions like layout boxes or semantic masks, often leads to identity loss and concept omission. In this paper, we introduce ShowFlow, a comprehensive framework designed to tackle these challenges. We propose ShowFlow-S for single-concept image generation, and ShowFlow-M for handling multiple concepts. ShowFlow-S introduces a KronA-WED adapter, which integrates a Kronecker adapter with weight and embedding decomposition, and employs a disentangled learning approach with a novel attention regularization objective to enhance single-concept generation. Building on this foundation, ShowFlow-M directly reuses the learned models from ShowFlow-S to support multi-concept generation without extra conditions, incorporating a Subject-Adaptive Matching Attention (SAMA) and a layout consistency strategy as the plug-and-play module. Extensive experiments and user studies validate ShowFlow's effectiveness, highlighting its potential in real-world applications like advertising and virtual dressing.