CVMay 24, 2024

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

arXiv:2405.15343v117 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the need for robust tools to combat video scams and copyright issues, though it is incremental as it builds on existing detection methods with new data and features.

The paper tackles the problem of detecting AI-generated videos to prevent misuse like scams, by introducing a large-scale dataset (GenVidDet) with over 2.66 million instances and a dual-branch 3D transformer method (DuB3D) that achieves 96.77% accuracy.

The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes