Future Video Prediction from a Single Frame for Video Anomaly Detection
This addresses the challenge of long-term motion modeling in semi-supervised video anomaly detection, which is important for security and surveillance applications, though it is an incremental improvement over existing proxy-task approaches.
The paper tackles video anomaly detection by introducing future video prediction from a single frame as a proxy-task, achieving superior performance on benchmark datasets like ShanghaiTech, UCSD-Ped1, and UCSD-Ped2 compared to state-of-the-art prediction-based methods.
Video anomaly detection (VAD) is an important but challenging task in computer vision. The main challenge rises due to the rarity of training samples to model all anomaly cases. Hence, semi-supervised anomaly detection methods have gotten more attention, since they focus on modeling normals and they detect anomalies by measuring the deviations from normal patterns. Despite impressive advances of these methods in modeling normal motion and appearance, long-term motion modeling has not been effectively explored so far. Inspired by the abilities of the future frame prediction proxy-task, we introduce the task of future video prediction from a single frame, as a novel proxy-task for video anomaly detection. This proxy-task alleviates the challenges of previous methods in learning longer motion patterns. Moreover, we replace the initial and future raw frames with their corresponding semantic segmentation map, which not only makes the method aware of object class but also makes the prediction task less complex for the model. Extensive experiments on the benchmark datasets (ShanghaiTech, UCSD-Ped1, and UCSD-Ped2) show the effectiveness of the method and the superiority of its performance compared to SOTA prediction-based VAD methods.