Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation
This work addresses the challenge of ambiguous facial expressions in real-world DFER applications, offering an incremental improvement through a novel data augmentation technique.
The paper tackled the problem of recognizing ambiguous facial expressions in dynamic facial expression recognition (DFER) by proposing MIDAS, a soft label-based data augmentation method, which improved performance over state-of-the-art methods on datasets like DFEW and FERV39k-Plus.
Dynamic facial expression recognition (DFER) is a task that estimates emotions from facial expression video sequences. For practical applications, accurately recognizing ambiguous facial expressions -- frequently encountered in in-the-wild data -- is essential. In this study, we propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data using soft labels representing probabilities of multiple emotion classes. MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels. This approach extends mixup to soft-labeled video data, offering a simple yet highly effective method for handling ambiguity in DFER. To evaluate MIDAS, we conducted experiments on both the DFEW dataset and FERV39k-Plus, a newly constructed dataset that assigns soft labels to an existing DFER dataset. The results demonstrate that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.