CVJan 12, 2018

Detecting abnormal events in video using Narrowed Normality Clusters

arXiv:1801.05030v4113 citations
Originality Incremental advance
AI Analysis

This addresses video surveillance and security monitoring by improving detection accuracy and speed, but it is incremental as it builds on existing outlier detection and clustering techniques.

The paper tackles abnormal event detection in video by formulating it as outlier detection and proposes a two-stage algorithm using k-means clustering and one-class SVMs, achieving better results than state-of-the-art methods in most cases and processing at 24 frames per second on a single CPU.

We formulate the abnormal event detection problem as an outlier detection task and we propose a two-stage algorithm based on k-means clustering and one-class Support Vector Machines (SVM) to eliminate outliers. In the feature extraction stage, we propose to augment spatio-temporal cubes with deep appearance features extracted from the last convolutional layer of a pre-trained neural network. After extracting motion and appearance features from the training video containing only normal events, we apply k-means clustering to find clusters representing different types of normal motion and appearance features. In the first stage, we consider that clusters with fewer samples (with respect to a given threshold) contain mostly outliers, and we eliminate these clusters altogether. In the second stage, we shrink the borders of the remaining clusters by training a one-class SVM model on each cluster. To detected abnormal events in the test video, we analyze each test sample and consider its maximum normality score provided by the trained one-class SVM models, based on the intuition that a test sample can belong to only one cluster of normality. If the test sample does not fit well in any narrowed normality cluster, then it is labeled as abnormal. We compare our method with several state-of-the-art methods on three benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve better results in most cases, while processing the test video in real-time at 24 frames per second on a single CPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes