CV CR LGNov 18, 2025

ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection

arXiv:2511.14554v1

Originality Incremental advance

AI Analysis

This addresses the threat of deepfakes to information integrity and societal stability, offering improved robustness for detection systems, though it is incremental as it builds on existing multi-modal approaches.

The paper tackled the problem of deepfake detection by proposing a tri-modal forensic framework that fuses RGB, texture, and frequency evidence, achieving an AUC of 0.9752 and accuracy of 0.9208 on Celeb-DF (v2).

Deepfakes generated by advanced GANs and autoencoders severely threaten information integrity and societal stability. Single-stream CNNs fail to capture multi-scale forgery artifacts across spatial, texture, and frequency domains, limiting robustness and generalization. We introduce the ForensicFlow, a tri-modal forensic framework that synergistically fuses RGB, texture, and frequency evidence for video Deepfake detection. The RGB branch (ConvNeXt-tiny) extracts global visual inconsistencies; the texture branch (Swin Transformer-tiny) detects fine-grained blending artifacts; the frequency branch (CNN + SE) identifies periodic spectral noise. Attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive attention fusion balances branch contributions.Trained on Celeb-DF (v2) with Focal Loss, ForensicFlow achieves AUC 0.9752, F1-Score 0.9408, and accuracy 0.9208, outperforming single-stream baselines. Ablation validates branch synergy; Grad-CAM confirms forensic focus. This comprehensive feature fusion provides superior resilience against subtle forgeries.

View on arXiv PDF

Similar