CVMay 24, 2024

ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model

Tencent
arXiv:2405.15287v2h-index: 16
Originality Incremental advance
AI Analysis

This addresses the challenge of stylized text-to-image generation for applications in creative design and media, though it appears incremental as it builds on Stable Diffusion.

The paper tackles the problem of generating images from text prompts and style references by introducing ArtWeaver, a framework that improves style integration and semantic consistency, achieving superior performance over existing methods in experiments.

Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. In this paper, we present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion (SD) to address challenges such as misinterpreted styles and inconsistent semantics. Our approach introduces two innovative modules: the mixed style descriptor and the dynamic attention adapter. The mixed style descriptor enhances SD by combining content-aware and frequency-disentangled embeddings from CLIP with additional sources that capture global statistics and textual information, thus providing a richer blend of style-related and semantic-related knowledge. To achieve a better balance between adapter capacity and semantic control, the dynamic attention adapter is integrated into the diffusion UNet, dynamically calculating adaptation weights based on the style descriptors. Additionally, we introduce two objective functions to optimize the model alongside the denoising loss, further enhancing semantic and style consistency. Extensive experiments demonstrate the superiority of ArtWeaver over existing methods, producing images with diverse target styles while maintaining the semantic integrity of the text prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes