MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
This addresses the limitation of lack of fine-grained control for users of text-to-image models, though it appears incremental as it builds on existing diffusion models.
The paper tackles the problem of fine-grained control in text-to-image diffusion models by introducing component-controllable personalization, achieving superior performance in enabling users to customize and reconfigure individual components within concepts.
Text-to-image diffusion models can generate high-quality images but lack fine-grained control of visual concepts, limiting their creativity. Thus, we introduce component-controllable personalization, a new task that enables users to customize and reconfigure individual components within concepts. This task faces two challenges: semantic pollution, where undesired elements disrupt the target concept, and semantic imbalance, which causes disproportionate learning of the target concept and component. To address these, we design MagicTailor, a framework that uses Dynamic Masked Degradation to adaptively perturb unwanted visual semantics and Dual-Stream Balancing for more balanced learning of desired visual semantics. The experimental results show that MagicTailor achieves superior performance in this task and enables more personalized and creative image generation.