CVAug 23, 2025

A Novel Local Focusing Mechanism for Deepfake Detection Generalization

Mingliang Li, Lin Yuanbo Wu, Changhong Liu, Hanxi Li

arXiv:2508.17029v11 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for robust and generalizable deepfake detection methods, which is crucial for security and media integrity, though it appears incremental by building on existing reconstruction learning approaches.

The paper tackled the problem of poor generalization in deepfake detection across object categories and generation domains by proposing a Local Focus Mechanism (LFM), which achieved a 3.7% accuracy improvement and 2.8 increase in average precision over the state-of-the-art method while maintaining high efficiency at 1789 FPS.

The rapid advancement of deepfake generation techniques has intensified the need for robust and generalizable detection methods. Existing approaches based on reconstruction learning typically leverage deep convolutional networks to extract differential features. However, these methods show poor generalization across object categories (e.g., from faces to cars) and generation domains (e.g., from GANs to Stable Diffusion), due to intrinsic limitations of deep CNNs. First, models trained on a specific category tend to overfit to semantic feature distributions, making them less transferable to other categories, especially as network depth increases. Second, Global Average Pooling (GAP) compresses critical local forgery cues into a single vector, thus discarding discriminative patterns vital for real-fake classification. To address these issues, we propose a novel Local Focus Mechanism (LFM) that explicitly attends to discriminative local features for differentiating fake from real images. LFM integrates a Salience Network (SNet) with a task-specific Top-K Pooling (TKP) module to select the K most informative local patterns. To mitigate potential overfitting introduced by Top-K pooling, we introduce two regularization techniques: Rank-Based Linear Dropout (RBLD) and Random-K Sampling (RKS), which enhance the model's robustness. LFM achieves a 3.7 improvement in accuracy and a 2.8 increase in average precision over the state-of-the-art Neighboring Pixel Relationships (NPR) method, while maintaining exceptional efficiency at 1789 FPS on a single NVIDIA A6000 GPU. Our approach sets a new benchmark for cross-domain deepfake detection. The source code are available in https://github.com/lmlpy/LFM.git

View on arXiv PDF Code

Similar