Compound Expression Recognition via Multi Model Ensemble
This work addresses the challenge of complex human emotional expressions for applications in interpersonal interactions, but it is incremental as it applies existing ensemble techniques to a specific domain.
The paper tackled the problem of recognizing compound facial expressions by proposing an ensemble learning method that combines convolutional networks, Vision Transformers, and multi-scale local attention networks, achieving high accuracy on RAF-DB and enabling zero-shot recognition on parts of C-EXPR-DB.
Compound Expression Recognition (CER) plays a crucial role in interpersonal interactions. Due to the existence of Compound Expressions , human emotional expressions are complex, requiring consideration of both local and global facial expressions to make judgments. In this paper, to address this issue, we propose a solution based on ensemble learning methods for Compound Expression Recognition. Specifically, our task is classification, where we train three expression classification models based on convolutional networks, Vision Transformers, and multi-scale local attention networks. Then, through model ensemble using late fusion, we merge the outputs of multiple models to predict the final result. Our method achieves high accuracy on RAF-DB and is able to recognize expressions through zero-shot on certain portions of C-EXPR-DB.