CV MMSep 8, 2023

Style Generation: Image Synthesis based on Coarsely Matched Texts

Mengyao Cui, Zhe Zhu, Shao-Ping Lu, Yulu Yang

arXiv:2309.04608v11.51 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific challenge in text-to-image synthesis for applications like story visualization, but it is incremental as it builds on existing methods with a new task and dataset.

The paper tackles the problem of stylizing images using coarsely matched text guidance, introducing a two-stage GAN that first generates overall style from sentence features and then refines it with synthetic features, achieving validated results through extensive experiments.

Previous text-to-image synthesis algorithms typically use explicit textual instructions to generate/manipulate images accurately, but they have difficulty adapting to guidance in the form of coarsely matched texts. In this work, we attempt to stylize an input image using such coarsely matched text as guidance. To tackle this new problem, we introduce a novel task called text-based style generation and propose a two-stage generative adversarial network: the first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature, which is produced by a multi-modality style synthesis module. We re-filter one existing dataset and collect a new dataset for the task. Extensive experiments and ablation studies are conducted to validate our framework. The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization. Our datasets are published at https://www.kaggle.com/datasets/mengyaocui/style-generation.

View on arXiv PDF

Similar