CV CLAug 20, 2021

CIGLI: Conditional Image Generation from Language & Image

Xiaopeng Lu, Lynnette Ng, Jared Fernandez, Hao Zhu

arXiv:2108.08955v14.76 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses a multi-modal generation challenge for AI researchers, but it is incremental as it builds on existing text-image generation methods.

The paper tackles the problem of generating images from both a textual description and an image prompt, proposing the CIGLI task, and results show their novel language-image fusion model outperforms two baseline methods in quantitative and qualitative evaluations.

Multi-modal generation has been widely explored in recent years. Current research directions involve generating text based on an image or vice versa. In this paper, we propose a new task called CIGLI: Conditional Image Generation from Language and Image. Instead of generating an image based on text as in text-image generation, this task requires the generation of an image from a textual description and an image prompt. We designed a new dataset to ensure that the text description describes information from both images, and that solely analyzing the description is insufficient to generate an image. We then propose a novel language-image fusion model which improves the performance over two established baseline methods, as evaluated by quantitative (automatic) and qualitative (human) evaluations. The code and dataset is available at https://github.com/vincentlux/CIGLI.

View on arXiv PDF Code

Similar