CVApr 30, 2025

Multi-modal Transfer Learning for Dynamic Facial Emotion Recognition in the Wild

Ezra Engel, Lishan Li, Chris Hudy, Robert Schleusner

arXiv:2504.21248v11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses a challenging problem in computer vision for applications like human-computer interaction, but it is incremental.

The paper tackled dynamic facial emotion recognition in the wild by using multi-modal transfer learning with pretrained networks, resulting in modest accuracy improvements on the DFEW dataset.

Facial expression recognition (FER) is a subset of computer vision with important applications for human-computer-interaction, healthcare, and customer service. FER represents a challenging problem-space because accurate classification requires a model to differentiate between subtle changes in facial features. In this paper, we examine the use of multi-modal transfer learning to improve performance on a challenging video-based FER dataset, Dynamic Facial Expression in-the-Wild (DFEW). Using a combination of pretrained ResNets, OpenPose, and OmniVec networks, we explore the impact of cross-temporal, multi-modal features on classification accuracy. Ultimately, we find that these finely-tuned multi-modal feature generators modestly improve accuracy of our transformer-based classification model.

View on arXiv PDF

Similar