CVAIJul 18, 2025

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

arXiv:2507.13984v16 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of disentangling content and style in images for creative visual synthesis, offering a novel approach for VAR models, though it is incremental as it adapts existing decomposition concepts to a new framework.

The paper tackles content-style decomposition in visual autoregressive models to enable recontextualization and stylization, proposing CSD-VAR with innovations like scale-aware optimization and SVD-based rectification, and it outperforms prior methods with superior content preservation and stylization fidelity.

Disentangling content and style from a single image, known as content-style decomposition (CSD), enables recontextualization of extracted content and stylization of extracted styles, offering greater creative flexibility in visual synthesis. While recent personalization methods have explored the decomposition of explicit content style, they remain tailored for diffusion models. Meanwhile, Visual Autoregressive Modeling (VAR) has emerged as a promising alternative with a next-scale prediction paradigm, achieving performance comparable to that of diffusion models. In this paper, we explore VAR as a generative framework for CSD, leveraging its scale-wise generation process for improved disentanglement. To this end, we propose CSD-VAR, a novel method that introduces three key innovations: (1) a scale-aware alternating optimization strategy that aligns content and style representation with their respective scales to enhance separation, (2) an SVD-based rectification method to mitigate content leakage into style representations, and (3) an Augmented Key-Value (K-V) memory enhancing content identity preservation. To benchmark this task, we introduce CSD-100, a dataset specifically designed for content-style decomposition, featuring diverse subjects rendered in various artistic styles. Experiments demonstrate that CSD-VAR outperforms prior approaches, achieving superior content preservation and stylization fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes