CVMMIVApr 16, 2020

Video Face Manipulation Detection Through Ensemble of CNNs

arXiv:2004.07676v1271 citations
AI Analysis

This addresses the societal issue of malicious video face manipulation, but the approach is incremental as it builds on existing CNN methods.

The paper tackles the problem of detecting manipulated faces in videos, such as deepfakes, by proposing an ensemble of CNNs based on EfficientNetB4 with attention layers and siamese training, achieving promising results on two datasets with over 119,000 videos.

In the last few years, several techniques for facial manipulation in videos have been successfully developed and made available to the masses (i.e., FaceSwap, deepfake, etc.). These methods enable anyone to easily edit faces in video sequences with incredibly realistic results and a very little effort. Despite the usefulness of these tools in many fields, if used maliciously, they can have a significantly bad impact on society (e.g., fake news spreading, cyber bullying through fake revenge porn). The ability of objectively detecting whether a face has been manipulated in a video sequence is then a task of utmost importance. In this paper, we tackle the problem of face manipulation detection in video sequences targeting modern facial manipulation techniques. In particular, we study the ensembling of different trained Convolutional Neural Network (CNN) models. In the proposed solution, different models are obtained starting from a base network (i.e., EfficientNetB4) making use of two different concepts: (i) attention layers; (ii) siamese training. We show that combining these networks leads to promising face manipulation detection results on two publicly available datasets with more than 119000 videos.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes