CVMay 13, 2025

FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units

arXiv:2505.08294v22 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the threat of realistic deepfakes for security and media integrity, offering an incremental advance in multimodal detection.

The paper tackles the problem of detecting audio-visual deepfakes by proposing a framework that uses facial action units and fine-grained frame-wise comparisons, achieving state-of-the-art performance with up to 4.83% improvement over existing methods.

The rapid evolution of generative AI has increased the threat of realistic audio-visual deepfakes, demanding robust detection methods. Existing solutions primarily address unimodal (audio or visual) forgeries but struggle with multimodal manipulations due to inadequate handling of heterogeneous modality features and poor generalization across datasets. To this end, we propose a novel framework called FauForensics by introducing biologically invariant facial action units (FAUs), which is a quantitative descriptor of facial muscle activity linked to emotion physiology. It serves as forgery-resistant representations that reduce domain dependency while capturing subtle dynamics often disrupted in synthetic content. Besides, instead of comparing entire video clips as in prior works, our method computes fine-grained frame-wise audiovisual similarities via a dedicated fusion module augmented with learnable cross-modal queries. It dynamically aligns temporal-spatial lip-audio relationships while mitigating multi-modal feature heterogeneity issues. Experiments on FakeAVCeleb and LAV-DF show state-of-the-art (SOTA) performance and superior cross-dataset generalizability with up to an average of 4.83\% than existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes