CVJul 24, 2024

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

arXiv:2408.05083v132 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a specific limitation in face personalization for AI image generation, offering incremental improvement over existing text-based editing methods.

The paper tackles the problem of achieving fine-grained facial attribute control in text-to-image diffusion models, which existing methods struggle with, by using StyleGAN's disentangled latent space to condition the diffusion model, resulting in precise inversion with identity preservation and smooth attribute manipulation.

Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled $\mathcal{W+}$ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the $\mathcal{W+}$ space, we train a latent mapper to translate latent codes from $\mathcal{W+}$ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes