Multi-Channel Cross Modal Detection of Synthetic Face Images
This work addresses the need for reliable detection of synthetic face images to combat misinformation, but it is incremental as it builds on existing architectures with a new loss function.
The paper tackles the problem of detecting synthetic face images generated by advanced deep learning models, which are hard to distinguish from real images and can spread misinformation, by proposing a multi-channel architecture that analyzes frequency and visible spectra using Cross Modal Focal Loss, achieving competitive performance in cross-model experiments.
Synthetically generated face images have shown to be indistinguishable from real images by humans and as such can lead to a lack of trust in digital content as they can, for instance, be used to spread misinformation. Therefore, the need to develop algorithms for detecting entirely synthetic face images is apparent. Of interest are images generated by state-of-the-art deep learning-based models, as these exhibit a high level of visual realism. Recent works have demonstrated that detecting such synthetic face images under realistic circumstances remains difficult as new and improved generative models are proposed with rapid speed and arbitrary image post-processing can be applied. In this work, we propose a multi-channel architecture for detecting entirely synthetic face images which analyses information both in the frequency and visible spectra using Cross Modal Focal Loss. We compare the proposed architecture with several related architectures trained using Binary Cross Entropy and show in cross-model experiments that the proposed architecture supervised using Cross Modal Focal Loss, in general, achieves most competitive performance.