Video Forgery Detection with Optical Flow Residuals and Spatial-Temporal Consistency
This addresses the challenge of video forgery detection for security and media integrity, though it is incremental as it builds on existing methods by integrating complementary features.
The paper tackles the problem of detecting AI-generated forged videos by proposing a detection framework that combines RGB appearance features with optical flow residuals to capture spatial-temporal inconsistencies, achieving robust performance across ten diverse generative models in text-to-video and image-to-video tasks.
The rapid advancement of diffusion-based video generation models has led to increasingly realistic synthetic content, presenting new challenges for video forgery detection. Existing methods often struggle to capture fine-grained temporal inconsistencies, particularly in AI-generated videos with high visual fidelity and coherent motion. In this work, we propose a detection framework that leverages spatial-temporal consistency by combining RGB appearance features with optical flow residuals. The model adopts a dual-branch architecture, where one branch analyzes RGB frames to detect appearance-level artifacts, while the other processes flow residuals to reveal subtle motion anomalies caused by imperfect temporal synthesis. By integrating these complementary features, the proposed method effectively detects a wide range of forged videos. Extensive experiments on text-to-video and image-to-video tasks across ten diverse generative models demonstrate the robustness and strong generalization ability of the proposed approach.