LGMLFeb 6, 2015

Learning Efficient Anomaly Detectors from $K$-NN Graphs

arXiv:1502.01783v113 citations
Originality Incremental advance
AI Analysis

This work addresses anomaly detection for high-dimensional data, offering an incremental improvement with computational efficiency gains.

The paper tackles the problem of anomaly detection in high-dimensional data by proposing a non-parametric algorithm that uses K-NN distances and trains limited complexity models to imitate scores, achieving asymptotic optimality with convergence to the minimum volume level set. Results show superiority over existing K-NN based methods with significant computational savings.

We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average $K$-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at $α$-false alarm level if the predicted score is in the $α$-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate $α$, its decision region converges to the $α$-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing $K$-NN based anomaly detection algorithms, with significant computational savings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes