Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method
This addresses security risks from realistic AI-generated videos for cybersecurity applications, but it is incremental as it builds on existing detection methods with a new dataset.
The paper tackles the problem of detecting AI-generated videos by constructing a benchmark dataset using diffusion-based algorithms and degraded samples, and proposes a detection method based on local and global temporal defects, achieving results that serve as a baseline for future studies.
The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.