Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition
This work addresses a domain-specific challenge in computer vision for facial analysis, offering an incremental improvement by optimizing filter sizes to reduce training costs.
The paper tackles the problem of expensive training costs in Convolutional Neural Networks for facial action unit recognition by proposing an Optimized Filter Size CNN (OFS-CNN) that learns filter sizes and weights simultaneously, resulting in improved performance and efficiency compared to traditional methods.
Recognizing facial action units (AUs) during spontaneous facial displays is a challenging problem. Most recently, Convolutional Neural Networks (CNNs) have shown promise for facial AU recognition, where predefined and fixed convolution filter sizes are employed. In order to achieve the best performance, the optimal filter size is often empirically found by conducting extensive experimental validation. Such a training process suffers from expensive training cost, especially as the network becomes deeper. This paper proposes a novel Optimized Filter Size CNN (OFS-CNN), where the filter sizes and weights of all convolutional layers are learned simultaneously from the training data along with learning convolution filters. Specifically, the filter size is defined as a continuous variable, which is optimized by minimizing the training loss. Experimental results on two AU-coded spontaneous databases have shown that the proposed OFS-CNN is capable of estimating optimal filter size for varying image resolution and outperforms traditional CNNs with the best filter size obtained by exhaustive search. The OFS-CNN also beats the CNN using multiple filter sizes and more importantly, is much more efficient during testing with the proposed forward-backward propagation algorithm.