CLAICVMay 20, 2025

EmoGist: Efficient In-Context Learning for Visual Emotion Understanding

arXiv:2505.14660v21 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of nuanced visual emotion understanding for AI systems, offering an incremental improvement over existing methods.

The paper tackles visual emotion classification by introducing EmoGist, a training-free, in-context learning method that uses context-dependent emotion label descriptions to improve accuracy, achieving up to 12 points improvement in micro F1 scores on the Memotion dataset and up to 8 points in macro F1 on the FI dataset.

In this paper, we introduce EmoGist, a training-free, in-context learning method for performing visual emotion classification with LVLMs. The key intuition of our approach is that context-dependent definition of emotion labels could allow more accurate predictions of emotions, as the ways in which emotions manifest within images are highly context dependent and nuanced. EmoGist pre-generates multiple descriptions of emotion labels, by analyzing the clusters of example images belonging to each label. At test time, we retrieve a version of description based on the cosine similarity of test image to cluster centroids, and feed it together with the test image to a fast LVLM for classification. Through our experiments, we show that EmoGist allows up to 12 points improvement in micro F1 scores with the multi-label Memotion dataset, and up to 8 points in macro F1 in the multi-class FI dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes