CVApr 11, 2019

FTGAN: A Fully-trained Generative Adversarial Networks for Text to Face Generation

arXiv:1904.05729v142 citations
Originality Incremental advance
AI Analysis

It addresses text-to-face synthesis for public safety applications, but is incremental as it builds on existing text-to-image methods with a new dataset.

The paper tackles text-to-face generation by proposing FTGAN, a fully-trained GAN that simultaneously trains text encoder and image decoder, achieving a 59% similarity to ground-truth on a new dataset and boosting Inception Score to 4.63 on CUB.

As a sub-domain of text-to-image synthesis, text-to-face generation has huge potentials in public safety domain. With lack of dataset, there are almost no related research focusing on text-to-face synthesis. In this paper, we propose a fully-trained Generative Adversarial Network (FTGAN) that trains the text encoder and image decoder at the same time for fine-grained text-to-face generation. With a novel fully-trained generative network, FTGAN can synthesize higher-quality images and urge the outputs of the FTGAN are more relevant to the input sentences. In addition, we build a dataset called SCU-Text2face for text-to-face synthesis. Through extensive experiments, the FTGAN shows its superiority in boosting both generated images' quality and similarity to the input descriptions. The proposed FTGAN outperforms the previous state of the art, boosting the best reported Inception Score to 4.63 on the CUB dataset. On SCU-text2face, the face images generated by our proposed FTGAN just based on the input descriptions is of average 59% similarity to the ground-truth, which set a baseline for text-to-face synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes