CVApr 7

AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis

arXiv:2604.0590068.5
Predicted impact top 45% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of evaluating and improving affective capabilities in VLMs for researchers, but it is incremental as it builds on existing VLM frameworks with a new benchmark and prompting method.

The paper tackled the underexplored problem of holistic Affective Image Content Analysis (AICA) in Vision-Language Models (VLMs) by introducing AICA-Bench, a benchmark with three tasks, and found that VLMs have limitations like weak intensity calibration and shallow descriptions, which were addressed with Grounded Affective Tree Prompting to reduce errors and improve depth.

Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA), which integrates perception, reasoning, and generation into a unified framework, remains underexplored. To address this gap, we introduce AICA-Bench, a comprehensive benchmark with three core tasks: Emotion Understanding (EU), Emotion Reasoning (ER), and Emotion-Guided Content Generation (EGCG). We evaluate 23 VLMs and identify two major limitations: weak intensity calibration and shallow open-ended descriptions. To address these issues, we propose Grounded Affective Tree (GAT) Prompting, a training-free framework that combines visual scaffolding with hierarchical reasoning. Experiments show that GAT reduces intensity errors and improves descriptive depth, providing a strong baseline for future research on affective multimodal understanding and generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes