CVApr 8, 2022

A Video Anomaly Detection Framework based on Appearance-Motion Semantics Representation Consistency

arXiv:2204.04151v128 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting anomalies in surveillance videos without anomalous training data, but it is incremental as it builds on existing reconstruction and prediction methods.

The paper tackled video anomaly detection by proposing a framework that enforces consistency between appearance and motion semantics in normal samples, identifying anomalies based on low consistency, and achieved effectiveness in experiments.

Video anomaly detection refers to the identification of events that deviate from the expected behavior. Due to the lack of anomalous samples in training, video anomaly detection becomes a very challenging task. Existing methods almost follow a reconstruction or future frame prediction mode. However, these methods ignore the consistency between appearance and motion information of samples, which limits their anomaly detection performance. Anomalies only occur in the moving foreground of surveillance videos, so the semantics expressed by video frame sequences and optical flow without background information in anomaly detection should be highly consistent and significant for anomaly detection. Based on this idea, we propose Appearance-Motion Semantics Representation Consistency (AMSRC), a framework that uses normal data's appearance and motion semantic representation consistency to handle anomaly detection. Firstly, we design a two-stream encoder to encode the appearance and motion information representations of normal samples and introduce constraints to further enhance the consistency of the feature semantics between appearance and motion information of normal samples so that abnormal samples with low consistency appearance and motion feature representation can be identified. Moreover, the lower consistency of appearance and motion features of anomalous samples can be used to generate predicted frames with larger reconstruction error, which makes anomalies easier to spot. Experimental results demonstrate the effectiveness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes