DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting
This addresses the challenge of semantic consistency in text-guided image inpainting for applications like photo editing, though it appears incremental as it builds on existing GAN-based methods.
The paper tackled the problem of text-guided image inpainting by proposing DAFT-GAN, which uses dual affine transformations to combine text and image features and separate encoding of corrupted and uncorrupted regions, resulting in outperforming existing GAN-based models on three benchmark datasets.
In recent years, there has been a significant focus on research related to text-guided image inpainting. However, the task remains challenging due to several constraints, such as ensuring alignment between the image and the text, and maintaining consistency in distribution between corrupted and uncorrupted regions. In this paper, thus, we propose a dual affine transformation generative adversarial network (DAFT-GAN) to maintain the semantic consistency for text-guided inpainting. DAFT-GAN integrates two affine transformation networks to combine text and image features gradually for each decoding block. Moreover, we minimize information leakage of uncorrupted features for fine-grained image generation by encoding corrupted and uncorrupted regions of the masked image separately. Our proposed model outperforms the existing GAN-based models in both qualitative and quantitative assessments with three benchmark datasets (MS-COCO, CUB, and Oxford) for text-guided image inpainting.