Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks
This work addresses the computational cost and redundancy in ensemble methods for deep learning, specifically for facial expression recognition, representing an incremental improvement with practical applications in real-world scenarios.
The paper tackles the inefficiency of training ensembles of deep networks by proposing Ensembles with Shared Representations (ESRs), which reduce redundancy and computational load while maintaining diversity and generalization, achieving human-level performance and outperforming state-of-the-art methods on facial expression recognition datasets like AffectNet and FER+.
Ensemble methods, traditionally built with independently trained de-correlated models, have proven to be efficient methods for reducing the remaining residual generalization error, which results in robust and accurate methods for real-world applications. In the context of deep learning, however, training an ensemble of deep networks is costly and generates high redundancy which is inefficient. In this paper, we present experiments on Ensembles with Shared Representations (ESRs) based on convolutional networks to demonstrate, quantitatively and qualitatively, their data processing efficiency and scalability to large-scale datasets of facial expressions. We show that redundancy and computational load can be dramatically reduced by varying the branching level of the ESR without loss of diversity and generalization power, which are both important for ensemble performance. Experiments on large-scale datasets suggest that ESRs reduce the remaining residual generalization error on the AffectNet and FER+ datasets, reach human-level performance, and outperform state-of-the-art methods on facial expression recognition in the wild using emotion and affect concepts.