CVMar 14, 2025

Solution for 8th Competition on Affective & Behavior Analysis in-the-wild

arXiv:2503.11115v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses facial expression analysis for affective computing applications, but appears incremental as it builds on existing datasets and models.

The paper tackled the problem of robust and accurate facial action unit detection in-the-wild by introducing an innovative audio-visual multimodal method, achieving enhanced accuracy on the Aff-Wild2 dataset.

In this report, we present our solution for the Action Unit (AU) Detection Challenge, in 8th Competition on Affective Behavior Analysis in-the-wild. In order to achieve robust and accurate classification of facial action unit in the wild environment, we introduce an innovative method that leverages audio-visual multimodal data. Our method employs ConvNeXt as the image encoder and uses Whisper to extract Mel spectrogram features. For these features, we utilize a Transformer encoder-based feature fusion module to integrate the affective information embedded in audio and image features. This ensures the provision of rich high-dimensional feature representations for the subsequent multilayer perceptron (MLP) trained on the Aff-Wild2 dataset, enhancing the accuracy of AU detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes