A Theoretical Investigation of Graph Degree as an Unsupervised Normality Measure
This work addresses anomaly detection for data analysis, but it is incremental as it builds on existing graph and kernel methods.
The paper tackles the problem of unsupervised anomaly detection by proposing graph degree as a normality measure, showing that a simple method based on this achieves higher average accuracy compared to other unsupervised methods across 10 datasets.
For a graph representation of a dataset, a straightforward normality measure for a sample can be its graph degree. Considering a weighted graph, degree of a sample is the sum of the corresponding row's values in a similarity matrix. The measure is intuitive given the abnormal samples are usually rare and they are dissimilar to the rest of the data. In order to have an in-depth theoretical understanding, in this manuscript, we investigate the graph degree in spectral graph clustering based and kernel based point of views and draw connections to a recent kernel method for the two sample problem. We show that our analyses guide us to choose fully-connected graphs whose edge weights are calculated via universal kernels. We show that a simple graph degree based unsupervised anomaly detection method with the above properties, achieves higher accuracy compared to other unsupervised anomaly detection methods on average over 10 widely used datasets. We also provide an extensive analysis on the effect of the kernel parameter on the method's accuracy.