CVAIMay 8

Exposing and Mitigating Temporal Attack in Deepfake Video Detection

arXiv:2605.0739857.4
AI Analysis

This work addresses a critical vulnerability in deepfake video detection for security applications, but the proposed method is an incremental improvement over existing defenses.

The paper identifies that spatiotemporal deepfake detectors overfit on fragile temporal spectrum cues, making them vulnerable to evasion attacks. To address this, they propose SpInShield, a defense framework that improves robustness, achieving a 21.30 percentage point AUC gain over the strongest baseline under simulated spectral attacks.

While spatiotemporal deepfake detectors achieve high AUC, our experiments reveal their susceptibility to evasion attacks. These models tend to overfit on fragile temporal spectrum cues, rather than learning robust semantic causality. To mitigate this vulnerability, we propose SpInShield, a temporal spectral-invariant defense framework explicitly designed to decouple semantic motion from manipulatable spectral artifacts. We propose a learnable spectral adversary that dynamically synthesizes severe spectral deformations, simulating extreme attack scenarios. By employing a shortcut suppression optimization strategy, SpInShield compels the encoder to extract reliable forensic cues while purging unstable spectral statistics from the latent space. Experiments show that SpInShield obtains competitive performance on widely used datasets and outperforms the strongest baseline by 21.30 percentage points in AUC under simulated amplitude spectral attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes