CVAILGMay 24

Cross-Domain Generalization Limits of Vision Foundation Models in Facial Deepfake Detection

arXiv:2605.2496554.3Has Code
Predicted impact top 64% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

For digital forensics researchers, this work highlights the limitations of current foundation models in generalizing to unseen deepfake manipulations, particularly localized edits.

This paper evaluates the cross-domain generalization of Vision Foundation Models for facial deepfake detection, finding that while they perform well on full face synthesis, they struggle with localized face editing techniques, revealing fundamental limitations in linear probe evaluation.

The rapid evolution of generative models has enabled the creation of hyper-realistic facial deepfakes, exposing a critical vulnerability in modern digital forensics: the inability of detectors to generalize to unseen manipulation techniques. Traditional networks suffer from representation collapse, overfitting to localized artifact fingerprints of specific training generators. This work investigates whether modern Vision Foundation Models can serve as generalizable, out-of-the-box feature extractors capable of tracking forensic anomalies across entirely unseen generative manifolds. We conduct a systematic cross-domain evaluation comparing three foundational learning paradigms: fully supervised macro-semantic features (RoPE-ViT), pure self-supervised geometric features (DINOv3), and multi-teacher agglomerative representations (NVIDIA C-RADIOv4-H). By deploying frozen backbones subjected to downstream linear probing, we map the performance limitations of these architectures on the challenging DF40 benchmark. Our empirical findings expose the intrinsic trade-offs between pre-training paradigms and parameter scale, proving that while foundation models retain high discriminative capabilities for entire face synthesis, localized face editing techniques expose fundamental boundaries in linear probe evaluation structures. Source code and model weights are available in http://github.com/mribrahim/deepfake

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes