CVAIApr 30, 2024

Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

arXiv:2404.19171v113 citationsh-index: 12Has CodeICME
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting diverse deepfakes across modalities, which is crucial for security applications, though it appears incremental by building on existing detection frameworks.

The paper tackles the problem of generalizable detection for cross-modal deepfakes by explicitly learning cross-modal correlations, achieving superior generalizability over state-of-the-art methods on datasets like CMDFD and FakeAVCeleb.

With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes. While effective in their specific modalities, traditional detection methods fall short in addressing the generalizability of detection across diverse cross-modal deepfakes. This paper aims to explicitly learn potential cross-modal correlation to enhance deepfake detection towards various generation scenarios. Our approach introduces a correlation distillation task, which models the inherent cross-modal correlation based on content information. This strategy helps to prevent the model from overfitting merely to audio-visual synchronization. Additionally, we present the Cross-Modal Deepfake Dataset (CMDFD), a comprehensive dataset with four generation methods to evaluate the detection of diverse cross-modal deepfakes. The experimental results on CMDFD and FakeAVCeleb datasets demonstrate the superior generalizability of our method over existing state-of-the-art methods. Our code and data can be found at \url{https://github.com/ljj898/CMDFD-Dataset-and-Deepfake-Detection}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes