CVFeb 6, 2022

FEAT: Face Editing with Attention

arXiv:2202.02713v121 citations
Originality Incremental advance
AI Analysis

This work addresses the need for precise, text-driven face editing in computer vision applications, representing an incremental improvement over existing methods by enhancing spatial control.

The paper tackles the problem of local face manipulation in GAN-based editing by introducing learned attention maps to guide edits to specific regions, achieving superior disentangled and controllable facial region editing compared to alternative methods.

Employing the latent space of pretrained generators has recently been shown to be an effective means for GAN-based face manipulation. The success of this approach heavily relies on the innate disentanglement of the latent space axes of the generator. However, face manipulation often intends to affect local regions only, while common generators do not tend to have the necessary spatial disentanglement. In this paper, we build on the StyleGAN generator, and present a method that explicitly encourages face manipulation to focus on the intended regions by incorporating learned attention maps. During the generation of the edited image, the attention map serves as a mask that guides a blending between the original features and the modified ones. The guidance for the latent space edits is achieved by employing CLIP, which has recently been shown to be effective for text-driven edits. We perform extensive experiments and show that our method can perform disentangled and controllable face manipulations based on text descriptions by attending to the relevant regions only. Both qualitative and quantitative experimental results demonstrate the superiority of our method for facial region editing over alternative methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes