CV MMOct 31, 2025

Referee: Reference-aware Audiovisual Deepfake Detection

arXiv:2510.27475v1h-index: 2Has Code

Originality Highly original

AI Analysis

This addresses the threat of advanced deepfakes for security and media integrity, offering a novel approach that improves generalization beyond incremental gains.

The paper tackles the problem of detecting audiovisual deepfakes that generalize poorly to unseen forgeries by proposing Referee, a reference-aware method that uses speaker-specific cues from one-shot examples to achieve state-of-the-art performance on cross-dataset and cross-language evaluations.

Since deepfakes generated by advanced generative models have rapidly posed serious threats, existing audiovisual deepfake detection approaches struggle to generalize to unseen forgeries. We propose a novel reference-aware audiovisual deepfake detection method, called Referee. Speaker-specific cues from only one-shot examples are leveraged to detect manipulations beyond spatiotemporal artifacts. By matching and aligning identity-related queries from reference and target content into cross-modal features, Referee jointly reasons about audiovisual synchrony and identity consistency. Extensive experiments on FakeAVCeleb, FaceForensics++, and KoDF demonstrate that Referee achieves state-of-the-art performance on cross-dataset and cross-language evaluation protocols. Experimental results highlight the importance of cross-modal identity verification for future deepfake detection. The code is available at https://github.com/ewha-mmai/referee.

View on arXiv PDF Code

Similar