Generating Images from Captions with Attention
This addresses the problem of generating realistic images from text for applications in creative AI and data augmentation, representing an incremental improvement over existing generative models.
The paper tackles image generation from natural language descriptions by introducing a model that iteratively draws patches while attending to relevant words, achieving higher quality samples than baseline approaches on Microsoft COCO and generating novel scene compositions for unseen captions.
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.