CVAILGOct 11, 2024

Text-To-Image with Generative Adversarial Networks

arXiv:2410.08608v12 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This is an incremental study comparing existing methods for text-to-image generation in computer vision.

The paper compares five GAN-based methods for text-to-image generation, finding that the best model achieves 256x256 resolution and the worst 64x64, with metrics used to evaluate accuracy.

Generating realistic images from human texts is one of the most challenging problems in the field of computer vision (CV). The meaning of descriptions given can be roughly reflected by existing text-to-image approaches. In this paper, our main purpose is to propose a brief comparison between five different methods base on the Generative Adversarial Networks (GAN) to make image from the text. In addition, each model architectures synthesis images with different resolution. Furthermore, the best and worst obtained resolutions is 64*64, 256*256 respectively. However, we checked and compared some metrics that introduce the accuracy of each model. Also, by doing this study, we found out the best model for this problem by comparing these different approaches essential metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes