CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
This provides an incremental improvement for researchers in text-to-image generation, offering a simple baseline model.
The paper tackles text-to-image generation by proposing CanvasGAN, a recurrent model that incrementally patches a canvas while attending to text words, then upscales the canvas to produce images. They show it outperforms Reed et al.'s model as a stronger baseline for this task.
We propose a new recurrent generative model for generating images from text captions while attending on specific parts of text captions. Our model creates images by incrementally adding patches on a "canvas" while attending on words from text caption at each timestep. Finally, the canvas is passed through an upscaling network to generate images. We also introduce a new method for generating visual-semantic sentence embeddings based on self-attention over text. We compare our model's generated images with those generated Reed et. al.'s model and show that our model is a stronger baseline for text to image generation tasks.