GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection
This work addresses the problem of scaling video anomaly detection for public safety applications, but it is incremental as it builds on existing weakly-supervised methods with a novel data augmentation approach.
The paper tackles the challenge of limited and costly real-world anomaly data in video anomaly detection by proposing a framework that uses text-conditioned video generation to create synthetic videos for data augmentation, resulting in improved performance over state-of-the-art methods on UCF-Crime datasets.
Video anomaly detection (VAD) plays a critical role in public safety applications such as intelligent surveillance. However, the rarity, unpredictability, and high annotation cost of real-world anomalies make it difficult to scale VAD datasets, which limits the performance and generalization ability of existing models. To address this challenge, we propose a generative video-enhanced weakly-supervised video anomaly detection (GV-VAD) framework that leverages text-conditioned video generation models to produce semantically controllable and physically plausible synthetic videos. These virtual videos are used to augment training data at low cost. In addition, a synthetic sample loss scaling strategy is utilized to control the influence of generated synthetic samples for efficient training. The experiments show that the proposed framework outperforms state-of-the-art methods on UCF-Crime datasets. The code is available at https://github.com/Sumutan/GV-VAD.git.