CVLGSep 8, 2022

Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

arXiv:2209.03953v12 citationsh-index: 73
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate text-to-image generation for human faces, offering a practical solution for applications in creative design or media, though it is incremental as it builds on existing GAN and CLIP frameworks.

The paper tackles the problem of text-guided human face synthesis by proposing Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs without requiring text data during training, resulting in faster and more accurate image generation compared to prior work.

We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at test time. Our model does not require re-training or fine-tuning of the GANs or CLIP when encountering new text prompts. In contrast to prior work, we do not rely on optimization at test time, making our method orders of magnitude faster than prior work. Empirically, on FFHQ dataset, our method offers faster and more accurate generation of images from natural language descriptions with varying levels of detail compared to prior work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes