CVAug 9, 2024

DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting

arXiv:2408.04962v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the challenge of semantic consistency in text-guided image inpainting for applications like photo editing, though it appears incremental as it builds on existing GAN-based methods.

The paper tackled the problem of text-guided image inpainting by proposing DAFT-GAN, which uses dual affine transformations to combine text and image features and separate encoding of corrupted and uncorrupted regions, resulting in outperforming existing GAN-based models on three benchmark datasets.

In recent years, there has been a significant focus on research related to text-guided image inpainting. However, the task remains challenging due to several constraints, such as ensuring alignment between the image and the text, and maintaining consistency in distribution between corrupted and uncorrupted regions. In this paper, thus, we propose a dual affine transformation generative adversarial network (DAFT-GAN) to maintain the semantic consistency for text-guided inpainting. DAFT-GAN integrates two affine transformation networks to combine text and image features gradually for each decoding block. Moreover, we minimize information leakage of uncorrupted features for fine-grained image generation by encoding corrupted and uncorrupted regions of the masked image separately. Our proposed model outperforms the existing GAN-based models in both qualitative and quantitative assessments with three benchmark datasets (MS-COCO, CUB, and Oxford) for text-guided image inpainting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes