CVAug 15, 2023

StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

arXiv:2308.07863v1222 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the challenge of interpretable and controllable style transfer for image processing applications, though it is incremental in leveraging diffusion models.

The paper tackles the problem of content and style disentanglement in style transfer by proposing a framework that explicitly extracts content and implicitly learns style, achieving superior results and flexible control compared to state-of-the-art methods.

Content and style (C-S) disentanglement is a fundamental problem and critical challenge of style transfer. Existing approaches based on explicit definitions (e.g., Gram matrix) or implicit learning (e.g., GANs) are neither interpretable nor easy to control, resulting in entangled representations and less satisfying results. In this paper, we propose a new C-S disentangled framework for style transfer without using previous assumptions. The key insight is to explicitly extract the content information and implicitly learn the complementary style information, yielding interpretable and controllable C-S disentanglement and style transfer. A simple yet effective CLIP-based style disentanglement loss coordinated with a style reconstruction prior is introduced to disentangle C-S in the CLIP image space. By further leveraging the powerful style removal and generative ability of diffusion models, our framework achieves superior results than state of the art and flexible C-S disentanglement and trade-off control. Our work provides new insights into the C-S disentanglement in style transfer and demonstrates the potential of diffusion models for learning well-disentangled C-S characteristics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes