Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild
This work addresses the gap between lab performance and real-world application for FER, which is crucial for fields like marketing and education, though it is incremental as it benchmarks existing methods rather than introducing new ones.
The authors tackled the problem of inconsistent performance in facial expression recognition (FER) methods in practical settings by collecting a new in-the-wild dataset and evaluating 23 network architectures under a uniform protocol, resulting in ranked architectures and deployment recommendations.
Facial expression recognition (FER) is a crucial part of human-computer interaction. Existing FER methods achieve high accuracy and generalization based on different open-source deep models and training approaches. However, the performance of these methods is not always good when encountering practical settings, which are seldom explored. In this paper, we collected a new in-the-wild facial expression dataset for cross-domain validation. Twenty-three commonly used network architectures were implemented and evaluated following a uniform protocol. Moreover, various setups, in terms of input resolutions, class balance management, and pre-trained strategies, were verified to show the corresponding performance contribution. Based on extensive experiments on three large-scale FER datasets and our practical cross-validation, we ranked network architectures and summarized a set of recommendations on deploying deep FER methods in real scenarios. In addition, potential ethical rules, privacy issues, and regulations were discussed in practical FER applications such as marketing, education, and entertainment business.