CVAIOct 17, 2024

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

arXiv:2410.13370v318 citationsh-index: 29IJCAI
Originality Incremental advance
AI Analysis

This addresses the limitation of lack of fine-grained control for users of text-to-image models, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of fine-grained control in text-to-image diffusion models by introducing component-controllable personalization, achieving superior performance in enabling users to customize and reconfigure individual components within concepts.

Text-to-image diffusion models can generate high-quality images but lack fine-grained control of visual concepts, limiting their creativity. Thus, we introduce component-controllable personalization, a new task that enables users to customize and reconfigure individual components within concepts. This task faces two challenges: semantic pollution, where undesired elements disrupt the target concept, and semantic imbalance, which causes disproportionate learning of the target concept and component. To address these, we design MagicTailor, a framework that uses Dynamic Masked Degradation to adaptively perturb unwanted visual semantics and Dual-Stream Balancing for more balanced learning of desired visual semantics. The experimental results show that MagicTailor achieves superior performance in this task and enables more personalized and creative image generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes