Computational efficient deep neural network with difference attention maps for facial action unit detection
This work provides an incremental improvement in facial action unit detection for computer vision researchers and applications requiring efficient AU analysis.
This paper proposes a computationally efficient deep neural network (CEDNN) for facial action unit (AU) detection that utilizes difference images to generate spatial attention maps. The CEDNN model, combined with these attention maps, outperforms traditional deep learning methods on DISFA+ and CK+ datasets and achieves better results than state-of-the-art AU detection methods.
In this paper, we propose a computational efficient end-to-end training deep neural network (CEDNN) model and spatial attention maps based on difference images. Firstly, the difference image is generated by image processing. Then five binary images of difference images are obtained using different thresholds, which are used as spatial attention maps. We use group convolution to reduce model complexity. Skip connection and $\text{1}\times \text{1}$ convolution are used to ensure good performance even if the network model is not deep. As an input, spatial attention map can be selectively fed into the input of each block. The feature maps tend to focus on the parts that are related to the target task better. In addition, we only need to adjust the parameters of classifier to train different numbers of AU. It can be easily extended to varying datasets without increasing too much computation. A large number of experimental results show that the proposed CEDNN is obviously better than the traditional deep learning method on DISFA+ and CK+ datasets. After adding spatial attention maps, the result is better than the most advanced AU detection method. At the same time, the scale of the network is small, the running speed is fast, and the requirement for experimental equipment is low.