CVCLOct 31, 2024

Scaling Concept With Text-Guided Diffusion Models

arXiv:2410.24151v111 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the need for fine-grained control over concepts in generative models, enabling tasks like canonical pose generation and sound highlighting, though it is incremental as it builds on existing text-guided diffusion frameworks.

The paper tackles the problem of enhancing or suppressing existing concepts in text-guided diffusion models, rather than replacing them, and introduces ScalingConcept, a method that scales decomposed concepts in real inputs without adding new elements, achieving novel zero-shot applications across image and audio domains.

Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger). In this work, we explore a novel approach: instead of replacing a concept, can we enhance or suppress the concept itself? Through an empirical study, we identify a trend where concepts can be decomposed in text-guided diffusion models. Leveraging this insight, we introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements. To systematically evaluate our approach, we present the WeakConcept-10 dataset, where concepts are imperfect and need to be enhanced. More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains, including tasks such as canonical pose generation and generative sound highlighting or removal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes