CVCLLGFeb 12, 2020

Image-to-Image Translation with Text Guidance

arXiv:2002.05235v120 citations
AI Analysis

It addresses the challenge of text-guided image synthesis for applications like content creation, but is incremental as it builds on existing GAN-based translation methods.

This paper tackles the problem of embedding natural language descriptions into image-to-image translation using GANs, achieving superior performance on visual realism and semantic consistency with given descriptions on the COCO dataset.

The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes