CL CVSep 23, 2025

Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models

Mohammad Saim, Phan Anh Duong, Cat Luong, Aniket Bhanderi, Tianyu Jiang

arXiv:2509.19595v12 citationsh-index: 1EMNLP

Originality Incremental advance

AI Analysis

This work addresses the challenge of emotion analysis in vision for affect-aware applications, though it is incremental as it builds on existing models with a novel focus on body parts.

The paper tackled the problem of analyzing embodied emotions from body parts by proposing a framework using large vision-language models to generate multi-layered narratives, which outperformed baselines in recognizing emotions in face-masked images without fine-tuning.

The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily comprising descriptions that focus on the salient body parts involved in emotional reactions. We also employ attention maps and observe that contemporary models exhibit a persistent bias towards the facial region. Despite this limitation, we observe that our employed framework can effectively recognize embodied emotions in face-masked images, outperforming baselines without any fine-tuning. ELENA opens a new trajectory for embodied emotion analysis across the modality of vision and enriches modeling in an affect-aware setting.

View on arXiv PDF

Similar