CVMar 9, 2023

Multi-level Memory-augmented Appearance-Motion Correspondence Framework for Video Anomaly Detection

Xiangyu Huang, Caidan Zhao, Jinghui Yu, Chenxing Gao, Zhiqiang Wu

arXiv:2303.05116v17.612 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses video anomaly detection for surveillance and security applications, offering incremental improvements over existing methods.

The paper tackles the problem of underutilized appearance-motion correlation and poor generalizability in unsupervised video anomaly detection by proposing a multi-level memory-augmented framework, achieving state-of-the-art AUCs of 99.6%, 93.8%, and 76.3% on three datasets.

Frame prediction based on AutoEncoder plays a significant role in unsupervised video anomaly detection. Ideally, the models trained on the normal data could generate larger prediction errors of anomalies. However, the correlation between appearance and motion information is underutilized, which makes the models lack an understanding of normal patterns. Moreover, the models do not work well due to the uncontrollable generalizability of deep AutoEncoder. To tackle these problems, we propose a multi-level memory-augmented appearance-motion correspondence framework. The latent correspondence between appearance and motion is explored via appearance-motion semantics alignment and semantics replacement training. Besides, we also introduce a Memory-Guided Suppression Module, which utilizes the difference from normal prototype features to suppress the reconstruction capacity caused by skip-connection, achieving the tradeoff between the good reconstruction of normal data and the poor reconstruction of abnormal data. Experimental results show that our framework outperforms the state-of-the-art methods, achieving AUCs of 99.6\%, 93.8\%, and 76.3\% on UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets.

View on arXiv PDF

Similar