LGDec 13, 2021

Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance

arXiv:2112.08940v1
Originality Incremental advance
AI Analysis

This solves operational reliability problems for VMware's enterprise customers by maintaining an effective anomaly detection system, though it appears incremental as it builds on an existing deployed system.

VMware tackled label scarcity and data drift challenges in their ML-based anomaly detection system for enterprise performance issues, improving model accuracy by 30% and preventing performance degradation over time.

We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity and label bias due to heavy dependency on unscalable human annotators, and (ii) data drifts due to ever-changing workload patterns, software stack and underlying hardware. Our anomaly detection system has been deployed in production for many years and has successfully detected numerous major performance issues. We demonstrate that by addressing these data challenges, we not only improve the accuracy of our performance anomaly detection model by 30%, but also ensure that the model performance to never degrade over time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes