Weakly-Supervised Spatiotemporal Anomaly Detection
This work addresses the need for efficient anomaly detection in videos without expensive frame-level annotations, offering a practical solution for surveillance applications.
The paper proposes a weakly-supervised method for spatiotemporal anomaly detection using only video-level labels, achieving localization of anomalies in both space and time on the UCF Crime2Local dataset.
In this paper, we explore a weakly supervised method for anomaly detection. Since annotating videos is time-consuming, we only look at weak video-level labels during training. This means that given a video, we know that it is either normal or contains an anomaly, but no further annotations are used to train the network. Features are extracted from video clips that are either normal or anomalous. These features are used to determine anomaly scores for spatiotemporal regions of the clips based on a classifier and the implementation of a multiple instance ranking loss (MIL). We represent both anomalous and normal video clips as positive and negative bags, respectively, to apply MIL. Furthermore, since anomalies are usually localized to a part of a frame rather than the whole frame, we chose to explore temporal as well as spatial anomaly detection. We show our results on the UCF Crime2Local Dataset, which contains spatiotemporal annotations for a portion of the UCF Crime Dataset.