Fine-Grained Image Generation from Bangla Text Description using Attentional Generative Adversarial Network
This addresses image generation for Bangla speakers, an incremental adaptation of existing methods to a new language.
The paper tackles fine-grained image generation from Bangla text descriptions using an attentional GAN, achieving a better inception score on the CUB dataset.
Generating fine-grained, realistic images from text has many applications in the visual and semantic realm. Considering that, we propose Bangla Attentional Generative Adversarial Network (AttnGAN) that allows intensified, multi-stage processing for high-resolution Bangla text-to-image generation. Our model can integrate the most specific details at different sub-regions of the image. We distinctively concentrate on the relevant words in the natural language description. This framework has achieved a better inception score on the CUB dataset. For the first time, a fine-grained image is generated from Bangla text using attentional GAN. Bangla has achieved 7th position among 100 most spoken languages. This inspires us to explicitly focus on this language, which will ensure the inevitable need of many people. Moreover, Bangla has a more complex syntactic structure and less natural language processing resource that validates our work more.