Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation
This work addresses the challenge of building efficient and robust facial expression recognition systems for real-time applications, though it appears incremental in nature.
The paper tackled the problem of automatic facial expression recognition in the wild by proposing an ensemble model (EmoXNet) and a distilled version (EmoXNetLite), achieving test accuracies of 85.07% on FER2013 and 86.25% on RAF-DB for the ensemble, and 82.07% on FER2013 and 81.78% on RAF-DB for the distilled model.
Facial expressions are the most common universal forms of body language. In the past few years, automatic facial expression recognition (FER) has been an active field of research. However, it is still a challenging task due to different uncertainties and complications. Nevertheless, efficiency and performance are yet essential aspects for building robust systems. We proposed two models, EmoXNet which is an ensemble learning technique for learning convoluted facial representations, and EmoXNetLite which is a distillation technique that is useful for transferring the knowledge from our ensemble model to an efficient deep neural network using label-smoothen soft labels for able to effectively detect expressions in real-time. Both of the techniques performed quite well, where the ensemble model (EmoXNet) helped to achieve 85.07% test accuracy on FER2013 with FER+ annotations and 86.25% test accuracy on RAF-DB. Moreover, the distilled model (EmoXNetLite) showed 82.07% test accuracy on FER2013 with FER+ annotations and 81.78% test accuracy on RAF-DB. Results show that our models seem to generalize well on new data and are learned to focus on relevant facial representations for expressions recognition.