LG AIOct 3, 2022

Unsupervised Model Selection for Time-series Anomaly Detection

Mononito Goswami, Cristian Challu, Laurent Callot, Lenon Minorics, Andrey Kan

CMU

arXiv:2210.01078v319.544 citationsh-index: 14Has Code

Originality Incremental advance

AI Analysis

This addresses a practical challenge for practitioners in time-series anomaly detection where labels are scarce, offering an incremental improvement over existing methods.

The paper tackles the problem of selecting the most accurate anomaly detection model for time-series data without using labels, by proposing an unsupervised approach based on surrogate metrics and robust rank aggregation, and shows it performs as effectively as using partially labeled data in experiments.

Anomaly detection in time-series has a wide range of practical applications. While numerous anomaly detection methods have been proposed in the literature, a recent survey concluded that no single method is the most accurate across various datasets. To make matters worse, anomaly labels are scarce and rarely available in practice. The practical problem of selecting the most accurate model for a given dataset without labels has received little attention in the literature. This paper answers this question i.e. Given an unlabeled dataset and a set of candidate anomaly detectors, how can we select the most accurate model? To this end, we identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies, and show that some metrics are highly correlated with standard supervised anomaly detection performance metrics such as the $F_1$ score, but to varying degrees. We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem. We then provide theoretical justification behind the proposed approach. Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model based on partially labeled data.

View on arXiv PDF Code

Similar