CV AI LG MLSep 20, 2018

C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis

K J Joseph, Arghya Pal, Sailaja Rajanala, Vineeth N Balasubramanian

arXiv:1809.10238v17.827 citations

Originality Incremental advance

AI Analysis

This addresses the limitation of single-caption text-to-image synthesis for applications like image editing and virtual reality, though it is incremental as it builds on existing methods.

The paper tackles the problem of generating images from text descriptions by using multiple captions instead of a single one, achieving improved results on the CUB and Oxford-102 Flowers datasets.

Generating an image from its description is a challenging task worth solving because of its numerous practical applications ranging from image editing to virtual reality. All existing methods use one single caption to generate a plausible image. A single caption by itself, can be limited, and may not be able to capture the variety of concepts and behavior that may be present in the image. We propose two deep generative models that generate an image by making use of multiple captions describing it. This is achieved by ensuring 'Cross-Caption Cycle Consistency' between the multiple captions and the generated image(s). We report quantitative and qualitative results on the standard Caltech-UCSD Birds (CUB) and Oxford-102 Flowers datasets to validate the efficacy of the proposed approach.

View on arXiv PDF

Similar