CVCLNov 28, 2025

Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols

arXiv:2511.22929v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of automating art interpretation for art historians and AI researchers, but it is incremental as it applies existing VLMs to a new domain without major methodological innovations.

The study evaluated three vision language models (VLMs) on their ability to interpret emotions and symbols in artworks, finding they perform well for concrete images but struggle with abstract or symbolic ones, and exhibit inconsistencies in answers.

Emotions are a fundamental aspect of artistic expression. Due to their abstract nature, there is a broad spectrum of emotion realization in artworks. These are subject to historical change and their analysis requires expertise in art history. In this article, we investigate which aspects of emotional expression can be detected by current (2025) vision language models (VLMs). We present a case study of three VLMs (Llava-Llama and two Qwen models) in which we ask these models four sets of questions of increasing complexity about artworks (general content, emotional content, expression of emotions, and emotion symbols) and carry out a qualitative expert evaluation. We find that the VLMs recognize the content of the images surprisingly well and often also which emotions they depict and how they are expressed. The models perform best for concrete images but fail for highly abstract or highly symbolic images. Reliable recognition of symbols remains fundamentally difficult. Furthermore, the models continue to exhibit the well-known LLM weakness of providing inconsistent answers to related questions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes