Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation
This work addresses the challenge of generating more accurate and realistic images from text descriptions for applications in AI and computer vision, representing an incremental improvement over existing methods.
The paper tackles the problem of text-to-image generation by addressing the limitation of using only sentence-level text information, which misses key attributes, and proposes an attribute-driven memory-augmented GAN that improves performance, reducing FID scores from 14.81 to 8.57 on CUB and from 21.42 to 12.39 on COCO datasets.
As a challenging task, text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions. Existing methods mainly extract the text information from only one sentence to represent an image and the text representation effects the quality of the generated image well. However, directly utilizing the limited information in one sentence misses some key attribute descriptions, which are the crucial factors to describe an image accurately. To alleviate the above problem, we propose an effective text representation method with the complements of attribute information. Firstly, we construct an attribute memory to jointly control the text-to-image generation with sentence input. Secondly, we explore two update mechanisms, sample-aware and sample-joint mechanisms, to dynamically optimize a generalized attribute memory. Furthermore, we design an attribute-sentence-joint conditional generator learning scheme to align the feature embeddings among multiple representations, which promotes the cross-modal network training. Experimental results illustrate that the proposed method obtains substantial performance improvements on both the CUB (FID from 14.81 to 8.57) and COCO (FID from 21.42 to 12.39) datasets.