CVAug 14, 2019

Dual Adversarial Inference for Text-to-Image Synthesis

arXiv:1908.05324v139 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating images with both explicit and implicit text information for applications in AI and computer vision, though it is incremental.

The paper tackles the problem of text-to-image synthesis by learning disentangled content and style representations, improving image quality on Oxford-102, CUB, and COCO datasets.

Synthesizing images from a given text description involves engaging two types of information: the content, which includes information explicitly described in the text (e.g., color, composition, etc.), and the style, which is usually not well described in the text (e.g., location, quantity, size, etc.). However, in previous works, it is typically treated as a process of generating images only from the content, i.e., without considering learning meaningful style representations. In this paper, we aim to learn two variables that are disentangled in the latent space, representing content and style respectively. We achieve this by augmenting current text-to-image synthesis frameworks with a dual adversarial inference mechanism. Through extensive experiments, we show that our model learns, in an unsupervised manner, style representations corresponding to certain meaningful information present in the image that are not well described in the text. The new framework also improves the quality of synthesized images when evaluated on Oxford-102, CUB and COCO datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes