CVLGNEOct 10, 2015

Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

arXiv:1510.02969v3278 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the interpretability of CNNs in expression recognition, which is important for researchers and practitioners in computer vision and affective computing, though it is incremental in analyzing existing methods.

The authors tackled the problem of understanding what convolutional neural networks (CNNs) learn in facial expression recognition, showing that CNNs achieve state-of-the-art performance on two benchmarks (CK+ and TFD) and that the learned features align with Facial Action Units (FAUs).

Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn. In this work, not only do we show that CNNs can achieve strong performance, but we also introduce an approach to decipher which portions of the face influence the CNN's predictions. First, we train a zero-bias CNN on facial expression data and achieve, to our knowledge, state-of-the-art performance on two expression recognition benchmarks: the extended Cohn-Kanade (CK+) dataset and the Toronto Face Dataset (TFD). We then qualitatively analyze the network by visualizing the spatial patterns that maximally excite different neurons in the convolutional layers and show how they resemble Facial Action Units (FAUs). Finally, we use the FAU labels provided in the CK+ dataset to verify that the FAUs observed in our filter visualizations indeed align with the subject's facial movements.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes