Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions
This addresses the data scarcity issue in facial action detection for computer vision applications, but it is incremental as it builds on existing 3D and GAN techniques.
The paper tackled the problem of limited annotated video data and low occurrence frequencies for facial action detection by using synthetic facial expressions generated from 3D reconstructions and GANs. The result was that a network trained on these synthetic expressions outperformed one trained on real video and surpassed state-of-the-art approaches on the FERA17 dataset.
Critical obstacles in training classifiers to detect facial actions are the limited sizes of annotated video databases and the relatively low frequencies of occurrence of many actions. To address these problems, we propose an approach that makes use of facial expression generation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest. To evaluate this approach, a deep neural network was trained on two separate datasets: One network was trained on video of synthesized facial expressions generated from FERA17; the other network was trained on unaltered video from the same database. Both networks used the same train and validation partitions and were tested on the test partition of actual video from FERA17. The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.