CV AIAug 28, 2025

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, Zhen Lei

arXiv:2508.21048v120.416 citationsh-index: 23

Originality Highly original

AI Analysis

This addresses the problem of practical deepfake detection for security and verification applications, representing a novel method rather than an incremental improvement.

The paper tackles the challenge of deepfake detection in real-world scenarios by introducing HydraFake, a dataset simulating hierarchical generalization testing, and Veritas, a multi-modal large language model-based detector using pattern-aware reasoning. Veritas achieves significant gains across out-of-distribution scenarios compared to previous detectors that fall short on unseen forgeries and data domains.

Deepfake detection remains a formidable challenge due to the complex and evolving nature of fake content in real-world scenarios. However, existing academic benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical deployments of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing. Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical reasoning patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different OOD scenarios, and is capable of delivering transparent and faithful detection outputs.

View on arXiv PDF

Similar