CV HCJan 19

Not all Blends are Equal: The BLEMORE Dataset of Blended Emotion Expressions with Relative Salience Annotations

Tim Lachmann, Alexandra Israelsson, Christina Tornberg, Teimuraz Saghinadze, Michal Balazia, Philipp Müller, Petri Laukka

arXiv:2601.13225v12.8

Originality Incremental advance

AI Analysis

This addresses the lack of datasets for blended emotion recognition, enabling more nuanced emotion analysis in AI systems, though it is incremental as it builds on existing multimodal approaches.

The paper tackles the problem of recognizing blended emotions with varying salience in videos by introducing the BLEMORE dataset, which includes over 3,000 clips with annotations for relative salience, and shows that multimodal methods achieve up to 35% accuracy for presence and 18% for salience on validation sets.

Humans often experience not just a single basic emotion at a time, but rather a blend of several emotions with varying salience. Despite the importance of such blended emotions, most video-based emotion recognition approaches are designed to recognize single emotions only. The few approaches that have attempted to recognize blended emotions typically cannot assess the relative salience of the emotions within a blend. This limitation largely stems from the lack of datasets containing a substantial number of blended emotion samples annotated with relative salience. To address this shortcoming, we introduce BLEMORE, a novel dataset for multimodal (video, audio) blended emotion recognition that includes information on the relative salience of each emotion within a blend. BLEMORE comprises over 3,000 clips from 58 actors, performing 6 basic emotions and 10 distinct blends, where each blend has 3 different salience configurations (50/50, 70/30, and 30/70). Using this dataset, we conduct extensive evaluations of state-of-the-art video classification approaches on two blended emotion prediction tasks: (1) predicting the presence of emotions in a given sample, and (2) predicting the relative salience of emotions in a blend. Our results show that unimodal classifiers achieve up to 29% presence accuracy and 13% salience accuracy on the validation set, while multimodal methods yield clear improvements, with ImageBind + WavLM reaching 35% presence accuracy and HiCMAE 18% salience accuracy. On the held-out test set, the best models achieve 33% presence accuracy (VideoMAEv2 + HuBERT) and 18% salience accuracy (HiCMAE). In sum, the BLEMORE dataset provides a valuable resource to advancing research on emotion recognition systems that account for the complexity and significance of blended emotion expressions.

View on arXiv PDF

Similar