CVJan 2, 2025

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

arXiv:2501.01184v39 citationsh-index: 27Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of detecting increasingly imperceptible deepfake videos for security and media integrity, though it is incremental in improving generalization over existing methods.

The paper tackles the challenge of generalizable deepfake video detection by proposing FakeSTormer, a multi-task learning framework that models subtle spatio-temporal inconsistencies, achieving superior performance on benchmarks compared to state-of-the-art methods.

Detecting deepfake videos is highly challenging given the complexity of characterizing spatio-temporal artifacts. Most existing methods rely on binary classifiers trained using real and fake image sequences, therefore hindering their generalization capabilities to unseen generation methods. Moreover, with the constant progress in generative Artificial Intelligence (AI), deepfake artifacts are becoming imperceptible at both the spatial and the temporal levels, making them extremely difficult to capture. To address these issues, we propose a fine-grained deepfake video detection approach called FakeSTormer that enforces the modeling of subtle spatio-temporal inconsistencies while avoiding overfitting. Specifically, we introduce a multi-task learning framework that incorporates two auxiliary branches for explicitly attending artifact-prone spatial and temporal regions. Additionally, we propose a video-level data synthesis strategy that generates pseudo-fake videos with subtle spatio-temporal artifacts, providing high-quality samples and hand-free annotations for our additional branches. Extensive experiments on several challenging benchmarks demonstrate the superiority of our approach compared to recent state-of-the-art methods. The code is available at https://github.com/10Ring/FakeSTormer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes