Each Attribute Matters: Contrastive Attention for Sentence-based Image Editing
This work addresses a specific bottleneck in image editing for users needing precise multi-attribute control, representing an incremental improvement over prior methods.
The paper tackles the problem of sentence-based image editing (SIE) with multiple attributes, where existing methods often fail to accurately edit all attributes. The proposed CA-GAN model generates encouraging results on datasets like CUB and COCO, improving attribute editing accuracy.
Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentence-based image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN