A Coarse-to-Fine Pseudo-Labeling (C2FPL) Framework for Unsupervised Video Anomaly Detection
This addresses the problem of detecting anomalous events in surveillance videos without any labeled data, which is a significant challenge in the field, representing a novel approach rather than an incremental improvement.
The paper tackles fully unsupervised video anomaly detection by proposing a two-stage pseudo-label generation framework that identifies anomalous video segments without any annotations, achieving superior performance compared to existing unsupervised and one-class classification methods and comparable results to state-of-the-art weakly supervised methods on datasets like UCF-Crime and XD-Violence.
Detection of anomalous events in videos is an important problem in applications such as surveillance. Video anomaly detection (VAD) is well-studied in the one-class classification (OCC) and weakly supervised (WS) settings. However, fully unsupervised (US) video anomaly detection methods, which learn a complete system without any annotation or human supervision, have not been explored in depth. This is because the lack of any ground truth annotations significantly increases the magnitude of the VAD challenge. To address this challenge, we propose a simple-but-effective two-stage pseudo-label generation framework that produces segment-level (normal/anomaly) pseudo-labels, which can be further used to train a segment-level anomaly detector in a supervised manner. The proposed coarse-to-fine pseudo-label (C2FPL) generator employs carefully-designed hierarchical divisive clustering and statistical hypothesis testing to identify anomalous video segments from a set of completely unlabeled videos. The trained anomaly detector can be directly applied on segments of an unseen test video to obtain segment-level, and subsequently, frame-level anomaly predictions. Extensive studies on two large-scale public-domain datasets, UCF-Crime and XD-Violence, demonstrate that the proposed unsupervised approach achieves superior performance compared to all existing OCC and US methods , while yielding comparable performance to the state-of-the-art WS methods.