Consistency-based Self-supervised Learning for Temporal Anomaly Localization
This work addresses the problem of localizing anomalous activities in videos for applications like surveillance, but it is incremental as it builds on existing regularization methods by incorporating self-supervised learning.
The paper tackles weakly supervised anomaly detection in videos, where only video-level labels are available, by proposing a consistency-based self-supervised learning method that enforces alignment of anomaly scores across different augmentations of the same video, showing improved performance on the XD-Violence dataset.
This work tackles Weakly Supervised Anomaly detection, in which a predictor is allowed to learn not only from normal examples but also from a few labeled anomalies made available during training. In particular, we deal with the localization of anomalous activities within the video stream: this is a very challenging scenario, as training examples come only with video-level annotations (and not frame-level). Several recent works have proposed various regularization terms to address it i.e. by enforcing sparsity and smoothness constraints over the weakly-learned frame-level anomaly scores. In this work, we get inspired by recent advances within the field of self-supervised learning and ask the model to yield the same scores for different augmentations of the same video sequence. We show that enforcing such an alignment improves the performance of the model on XD-Violence.