CVMar 13, 2025

HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition

arXiv:2503.10399v118 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses affective behavior analysis in-the-wild for applications like human-computer interaction, but it is incremental as it builds on existing pre-trained models and simple aggregation methods.

The authors tackled the problem of predicting ambivalence/hesitancy, emotional mimicry intensity, and facial expressions in audiovisual data by combining pre-trained facial, acoustic, and text features with simple classifiers, achieving significant improvements in validation metrics over baselines.

This article presents our results for the eighth Affective Behavior Analysis in-the-Wild (ABAW) competition. We combine facial emotional descriptors extracted by pre-trained models, namely, our EmotiEffLib library, with acoustic features and embeddings of texts recognized from speech. The frame-level features are aggregated and fed into simple classifiers, e.g., multi-layered perceptron (feed-forward neural network with one hidden layer), to predict ambivalence/hesitancy and facial expressions. In the latter case, we also use the pre-trained facial expression recognition model to select high-score video frames and prevent their processing with a domain-specific video classifier. The video-level prediction of emotional mimicry intensity is implemented by simply aggregating frame-level features and training a multi-layered perceptron. Experimental results for three tasks from the ABAW challenge demonstrate that our approach significantly increases validation metrics compared to existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes