Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
This work addresses scene-aware video anomaly detection, which is important for surveillance and security applications, but it appears incremental as it builds upon existing autoencoder-based frameworks with added contrastive learning and augmentation techniques.
The paper tackles the challenge of increasing scene-awareness in video anomaly detection by proposing a hierarchical semantic contrast method that learns from normal videos, achieving validated effectiveness on three public datasets and scene-dependent mixture datasets.
Increasing scene-awareness is a key challenge in video anomaly detection (VAD). In this work, we propose a hierarchical semantic contrast (HSC) method to learn a scene-aware VAD model from normal videos. We first incorporate foreground object and background scene features with high-level semantics by taking advantage of pre-trained video parsing models. Then, building upon the autoencoder-based reconstruction framework, we introduce both scene-level and object-level contrastive learning to enforce the encoded latent features to be compact within the same semantic classes while being separable across different classes. This hierarchical semantic contrast strategy helps to deal with the diversity of normal patterns and also increases their discrimination ability. Moreover, for the sake of tackling rare normal activities, we design a skeleton-based motion augmentation to increase samples and refine the model further. Extensive experiments on three public datasets and scene-dependent mixture datasets validate the effectiveness of our proposed method.