CVCLOct 13, 2021

Understanding of Emotion Perception from Art

arXiv:2110.06486v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of emotion perception from art for computational modeling, but it is incremental as it applies existing multimodal methods to a specific domain.

The paper tackled the problem of understanding emotions evoked by artwork in viewers by using multimodal classification with images and text captions, finding that single-stream transformer models like MMBT and VisualBERT outperformed image-only and dual-stream models, with improvements in extreme positive and negative emotion classes.

Computational modeling of the emotions evoked by art in humans is a challenging problem because of the subjective and nuanced nature of art and affective signals. In this paper, we consider the above-mentioned problem of understanding emotions evoked in viewers by artwork using both text and visual modalities. Specifically, we analyze images and the accompanying text captions from the viewers expressing emotions as a multimodal classification task. Our results show that single-stream multimodal transformer-based models like MMBT and VisualBERT perform better compared to both image-only models and dual-stream multimodal models having separate pathways for text and image modalities. We also observe improvements in performance for extreme positive and negative emotion classes, when a single-stream model like MMBT is compared with a text-only transformer model like BERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes