Cycle Text-To-Image GAN with BERT
This addresses text-to-image generation for AI applications, but it is incremental as it builds on existing GAN architectures.
The paper tackles image generation from captions by introducing a cyclic design that maps images back to captions and using BERT embeddings, achieving noticeable qualitative and quantitative improvements over an Attention GAN baseline.
We explore novel approaches to the task of image generation from their respective captions, building on state-of-the-art GAN architectures. Particularly, we baseline our models with the Attention-based GANs that learn attention mappings from words to image features. To better capture the features of the descriptions, we then built a novel cyclic design that learns an inverse function to maps the image back to original caption. Additionally, we incorporated recently developed BERT pretrained word embeddings as our initial text featurizer and observe a noticeable improvement in qualitative and quantitative performance compared to the Attention GAN baseline.