CVAug 18, 2025

WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual Art

arXiv:2508.12668v13.6h-index: 62025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Originality Synthesis-oriented

AI Analysis

This work addresses the need for automated formal analysis in art history and computational aesthetics, though it is incremental as it adapts an existing vision-language model to a specific domain.

The paper tackled the problem of predicting Wölfflin's five principles for stylistic analysis in visual art, which lacked effective metrics, by fine-tuning CLIP on annotated art datasets; the resulting WP-CLIP model generalized across diverse artistic styles, including GAN-generated paintings and the Pandora-18K dataset.

Wölfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, composition, and thematic choices. Recent advancements in vision-language models (VLMs) have demonstrated their ability to evaluate abstract image attributes, making them promising candidates for this task. In this work, we investigate whether CLIP, pre-trained on large-scale data, can understand and predict Wölfflin's principles. Our findings indicate that it does not inherently capture such nuanced stylistic elements. To address this, we fine-tune CLIP on annotated datasets of real art images to predict a score for each principle. We evaluate our model, WP-CLIP, on GAN-generated paintings and the Pandora-18K art dataset, demonstrating its ability to generalize across diverse artistic styles. Our results highlight the potential of VLMs for automated art analysis.

View on arXiv PDF

Similar