StyleDecoupler: Generalizable Artistic Style Disentanglement
This work addresses the problem of artistic style representation for researchers and practitioners in computer vision and art analysis, offering a generalizable method with applications in style retrieval and evaluation.
The paper tackles the challenge of disentangling artistic style from semantic content by proposing StyleDecoupler, an information-theoretic framework that isolates style features using uni-modal representations as content references, achieving state-of-the-art performance on style retrieval across datasets like WeART and WikiART.
Representing artistic style is challenging due to its deep entanglement with semantic content. We propose StyleDecoupler, an information-theoretic framework that leverages a key insight: multi-modal vision models encode both style and content, while uni-modal models suppress style to focus on content-invariant features. By using uni-modal representations as content-only references, we isolate pure style features from multi-modal embeddings through mutual information minimization. StyleDecoupler operates as a plug-and-play module on frozen Vision-Language Models without fine-tuning. We also introduce WeART, a large-scale benchmark of 280K artworks across 152 styles and 1,556 artists. Experiments show state-of-the-art performance on style retrieval across WeART and WikiART, while enabling applications like style relationship mapping and generative model evaluation. We release our method and dataset at this url.