CVMar 19, 2025

Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection

arXiv:2503.14853v225 citationsh-index: 12ICML
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting deepfakes for security and media integrity, representing an incremental advance by adapting LVLMs to a specific domain.

The paper tackles the problem of deepfake detection by unlocking the capabilities of Large Vision-Language Models (LVLMs) through a novel framework, achieving state-of-the-art generalization performance on multiple benchmarks.

Current Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in understanding multimodal data, but their potential remains underexplored for deepfake detection due to the misalignment of their knowledge and forensics patterns. To this end, we present a novel framework that unlocks LVLMs' potential capabilities for deepfake detection. Our framework includes a Knowledge-guided Forgery Detector (KFD), a Forgery Prompt Learner (FPL), and a Large Language Model (LLM). The KFD is used to calculate correlations between image features and pristine/deepfake image description embeddings, enabling forgery classification and localization. The outputs of the KFD are subsequently processed by the Forgery Prompt Learner to construct fine-grained forgery prompt embeddings. These embeddings, along with visual and question prompt embeddings, are fed into the LLM to generate textual detection responses. Extensive experiments on multiple benchmarks, including FF++, CDF2, DFD, DFDCP, DFDC, and DF40, demonstrate that our scheme surpasses state-of-the-art methods in generalization performance, while also supporting multi-turn dialogue capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes