Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge
This work addresses the challenge of limited training data for compound expression recognition, which is important for affective behavior analysis applications, but it appears incremental as it builds on existing models.
The paper tackled the problem of recognizing complex compound facial expressions in real-world scenarios by proposing a zero-shot approach using a pretrained visual language model integrated with traditional CNN networks, achieving results on the 6th ABAW Challenge dataset.
Conventional approaches to facial expression recognition primarily focus on the classification of six basic facial expressions. Nevertheless, real-world situations present a wider range of complex compound expressions that consist of combinations of these basics ones due to limited availability of comprehensive training datasets. The 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) offered unlabeled datasets containing compound expressions. In this study, we propose a zero-shot approach for recognizing compound expressions by leveraging a pretrained visual language model integrated with some traditional CNN networks.