SE AI LGJan 10, 2024

MTAD: Tools and Benchmarks for Multivariate Time Series Anomaly Detection

Jinyang Liu, Wenwei Gu, Zhuangbin Chen, Yichen Li, Yuxin Su, Michael R. Lyu

arXiv:2401.06175v14.79 citationsh-index: 21Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of inconsistent and potentially misleading evaluations in KPI anomaly detection for software system reliability, serving researchers and engineers, though it is incremental as it focuses on benchmarking rather than introducing a new detection method.

The paper tackles the lack of rigorous comparison and evaluation issues in multivariate time series anomaly detection for Key Performance Indicators (KPIs) by providing a comprehensive review and evaluation of twelve state-of-the-art methods, proposing a novel metric called salience, and reporting benchmark results on five publicly available datasets.

Key Performance Indicators (KPIs) are essential time-series metrics for ensuring the reliability and stability of many software systems. They faithfully record runtime states to facilitate the understanding of anomalous system behaviors and provide informative clues for engineers to pinpoint the root causes. The unprecedented scale and complexity of modern software systems, however, make the volume of KPIs explode. Consequently, many traditional methods of KPI anomaly detection become impractical, which serves as a catalyst for the fast development of machine learning-based solutions in both academia and industry. However, there is currently a lack of rigorous comparison among these KPI anomaly detection methods, and re-implementation demands a non-trivial effort. Moreover, we observe that different works adopt independent evaluation processes with different metrics. Some of them may not fully reveal the capability of a model and some are creating an illusion of progress. To better understand the characteristics of different KPI anomaly detectors and address the evaluation issue, in this paper, we provide a comprehensive review and evaluation of twelve state-of-the-art methods, and propose a novel metric called salience. Particularly, the selected methods include five traditional machine learning-based methods and seven deep learning-based methods. These methods are evaluated with five multivariate KPI datasets that are publicly available. A unified toolkit with easy-to-use interfaces is also released. We report the benchmark results in terms of accuracy, salience, efficiency, and delay, which are of practical importance for industrial deployment. We believe our work can contribute as a basis for future academic research and industrial application.

View on arXiv PDF Code

Similar