Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
This addresses the challenge of monitoring business-critical KPIs without labels for large internet companies, offering a practical solution with theoretical backing.
The paper tackles the problem of unsupervised anomaly detection for seasonal KPIs in web applications, proposing Donut, a VAE-based algorithm that achieves F-scores ranging from 0.75 to 0.9, outperforming state-of-the-art supervised and baseline methods.
To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.