CVCYSep 26, 2025

Training-Free Multimodal Deepfake Detection via Graph Reasoning

arXiv:2509.21774v1h-index: 17
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting manipulated content across visual, textual, and auditory modalities to improve information reliability, representing an incremental advancement in leveraging existing models for specific tasks.

The paper tackles the problem of multimodal deepfake detection by proposing a training-free framework that enhances large vision-language models to better capture subtle forgery cues and cross-modal inconsistencies, achieving performance gains across four forgery types without fine-tuning.

Multimodal deepfake detection (MDD) aims to uncover manipulations across visual, textual, and auditory modalities, thereby reinforcing the reliability of modern information systems. Although large vision-language models (LVLMs) exhibit strong multimodal reasoning, their effectiveness in MDD is limited by challenges in capturing subtle forgery cues, resolving cross-modal inconsistencies, and performing task-aligned retrieval. To this end, we propose Guided Adaptive Scorer and Propagation In-Context Learning (GASP-ICL), a training-free framework for MDD. GASP-ICL employs a pipeline to preserve semantic relevance while injecting task-aware knowledge into LVLMs. We leverage an MDD-adapted feature extractor to retrieve aligned image-text pairs and build a candidate set. We further design the Graph-Structured Taylor Adaptive Scorer (GSTAS) to capture cross-sample relations and propagate query-aligned signals, producing discriminative exemplars. This enables precise selection of semantically aligned, task-relevant demonstrations, enhancing LVLMs for robust MDD. Experiments on four forgery types show that GASP-ICL surpasses strong baselines, delivering gains without LVLM fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes