Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks
This work addresses the challenge of automating anomaly detection in videos with limited supervision, which is incremental as it builds on existing Conv-LSTM methods.
The authors tackled the problem of detecting anomalous events in long video sequences by proposing end-to-end trainable composite Conv-LSTM networks that predict video evolution from a few input frames, achieving competitive results on anomaly detection datasets.
Automating the detection of anomalous events within long video sequences is challenging due to the ambiguity of how such events are defined. We approach the problem by learning generative models that can identify anomalies in videos using limited supervision. We propose end-to-end trainable composite Convolutional Long Short-Term Memory (Conv-LSTM) networks that are able to predict the evolution of a video sequence from a small number of input frames. Regularity scores are derived from the reconstruction errors of a set of predictions with abnormal video sequences yielding lower regularity scores as they diverge further from the actual sequence over time. The models utilize a composite structure and examine the effects of conditioning in learning more meaningful representations. The best model is chosen based on the reconstruction and prediction accuracy. The Conv-LSTM models are evaluated both qualitatively and quantitatively, demonstrating competitive results on anomaly detection datasets. Conv-LSTM units are shown to be an effective tool for modeling and predicting video sequences.