ML LGJul 5, 2016

How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?

arXiv:1607.01152v115.586 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a practical solution for researchers and practitioners in anomaly detection who lack labeled data, though it is incremental as it builds on existing curves.

The paper addresses the challenge of evaluating unsupervised anomaly detection algorithms when labeled data are scarce, proposing and testing two label-free criteria based on Excess-Mass and Mass-Volume curves, along with a feature sub-sampling and aggregating methodology to extend their use to high-dimensional datasets.

When sufficient labeled data are available, classical criteria based on Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be used to compare the performance of un-supervised anomaly detection algorithms. However , in many situations, few or no data are labeled. This calls for alternative criteria one can compute on non-labeled data. In this paper, two criteria that do not require labels are empirically shown to discriminate accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which generally cannot be well estimated in large dimension. A methodology based on feature sub-sampling and aggregating is also described and tested, extending the use of these criteria to high-dimensional datasets and solving major drawbacks inherent to standard EM and MV curves.

View on arXiv PDF Code

Similar