SEAILGJan 10, 2024

MTAD: Tools and Benchmarks for Multivariate Time Series Anomaly Detection

arXiv:2401.06175v19 citationsh-index: 21
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inconsistent and potentially misleading evaluations in KPI anomaly detection for software system reliability, serving researchers and engineers, though it is incremental as it focuses on benchmarking rather than introducing a new detection method.

The paper tackles the lack of rigorous comparison and evaluation issues in multivariate time series anomaly detection for Key Performance Indicators (KPIs) by providing a comprehensive review and evaluation of twelve state-of-the-art methods, proposing a novel metric called salience, and reporting benchmark results on five publicly available datasets.

Key Performance Indicators (KPIs) are essential time-series metrics for ensuring the reliability and stability of many software systems. They faithfully record runtime states to facilitate the understanding of anomalous system behaviors and provide informative clues for engineers to pinpoint the root causes. The unprecedented scale and complexity of modern software systems, however, make the volume of KPIs explode. Consequently, many traditional methods of KPI anomaly detection become impractical, which serves as a catalyst for the fast development of machine learning-based solutions in both academia and industry. However, there is currently a lack of rigorous comparison among these KPI anomaly detection methods, and re-implementation demands a non-trivial effort. Moreover, we observe that different works adopt independent evaluation processes with different metrics. Some of them may not fully reveal the capability of a model and some are creating an illusion of progress. To better understand the characteristics of different KPI anomaly detectors and address the evaluation issue, in this paper, we provide a comprehensive review and evaluation of twelve state-of-the-art methods, and propose a novel metric called salience. Particularly, the selected methods include five traditional machine learning-based methods and seven deep learning-based methods. These methods are evaluated with five multivariate KPI datasets that are publicly available. A unified toolkit with easy-to-use interfaces is also released. We report the benchmark results in terms of accuracy, salience, efficiency, and delay, which are of practical importance for industrial deployment. We believe our work can contribute as a basis for future academic research and industrial application.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes