CVAug 29, 2024

CSGO: Content-Style Composition in Text-to-Image Generation

arXiv:2408.16766v279 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the data scarcity issue in style transfer for researchers and practitioners, offering a new dataset and model for improved control in image generation, though it is incremental in building upon existing diffusion-based methods.

The authors tackled the problem of limited data for style transfer in text-to-image generation by constructing IMAGStyle, a large-scale dataset of 210k content-style-stylized image triplets, and proposed CSGO, an end-to-end model that decouples content and style features, achieving enhanced style control capabilities.

The diffusion model has shown exceptional capabilities in controlled image generation, which has further fueled interest in image style transfer. Existing works mainly focus on training free-based methods (e.g., image inversion) due to the scarcity of specific data. In this study, we present a data construction pipeline for content-style-stylized image triplets that generates and automatically cleanses stylized data triplets. Based on this pipeline, we construct a dataset IMAGStyle, the first large-scale style transfer dataset containing 210k image triplets, available for the community to explore and research. Equipped with IMAGStyle, we propose CSGO, a style transfer model based on end-to-end training, which explicitly decouples content and style features employing independent feature injection. The unified CSGO implements image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis. Extensive experiments demonstrate the effectiveness of our approach in enhancing style control capabilities in image generation. Additional visualization and access to the source code can be located on the project page: \url{https://csgo-gen.github.io/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes