CVMMSDASJul 16, 2024

Statistics-aware Audio-visual Deepfake Detector

arXiv:2407.11650v29 citationsh-index: 27
AI Analysis

This work addresses the need for more efficient and effective deepfake detection tools for security and media verification, though it appears incremental by building on existing synchronization-based methods.

The paper tackled the problem of audio-visual deepfake detection by proposing an enhanced method that incorporates statistical feature loss, waveform-based audio representation, and shallower networks, achieving improved performance on DFDC and FakeAVCeleb datasets.

In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results, they are based on the maximization/minimization of isolated feature distances without considering feature statistics. Moreover, they rely on cumbersome deep learning architectures and are heavily dependent on empirically fixed hyperparameters. Herein, to overcome these limitations, we propose: (1) a statistical feature loss to enhance the discrimination capability of the model, instead of relying solely on feature distances; (2) using the waveform for describing the audio as a replacement of frequency-based representations; (3) a post-processing normalization of the fakeness score; (4) the use of shallower network for reducing the computational complexity. Experiments on the DFDC and FakeAVCeleb datasets demonstrate the relevance of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes