CVFeb 20, 2024

CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection

arXiv:2402.12927v188 citationsh-index: 29ICMR
Originality Incremental advance
AI Analysis

This addresses the need for effective detection mechanisms to mitigate risks from synthetic content, though it is incremental as it builds on existing adaptation methods.

The paper tackles the problem of universal deepfake detection by adapting vision-language models like CLIP, showing that retaining the textual component improves performance, with results including a 5.01% mAP and 6.61% accuracy gain over previous SOTA using less training data.

The recent advancements in Generative Adversarial Networks (GANs) and the emergence of Diffusion models have significantly streamlined the production of highly realistic and widely accessible synthetic content. As a result, there is a pressing need for effective general purpose detection mechanisms to mitigate the potential risks posed by deepfakes. In this paper, we explore the effectiveness of pre-trained vision-language models (VLMs) when paired with recent adaptation methods for universal deepfake detection. Following previous studies in this domain, we employ only a single dataset (ProGAN) in order to adapt CLIP for deepfake detection. However, in contrast to prior research, which rely solely on the visual part of CLIP while ignoring its textual component, our analysis reveals that retaining the text part is crucial. Consequently, the simple and lightweight Prompt Tuning based adaptation strategy that we employ outperforms the previous SOTA approach by 5.01% mAP and 6.61% accuracy while utilizing less than one third of the training data (200k images as compared to 720k). To assess the real-world applicability of our proposed models, we conduct a comprehensive evaluation across various scenarios. This involves rigorous testing on images sourced from 21 distinct datasets, including those generated by GANs-based, Diffusion-based and Commercial tools.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes