High Significant Fault Detection in Azure Core Workload Insights
This addresses the need for Azure Core users to efficiently monitor faults in complex time-series data, though it appears incremental as it builds on existing anomaly detection methods.
The paper tackles the problem of identifying high-significance anomalies in Azure Core workload time-series data to display on user dashboards, achieving a target of reporting only 5-20 anomalies per hour with high user perception and reconstruction error.
Azure Core workload insights have time-series data with different metric units. Faults or Anomalies are observed in these time-series data owing to faults observed with respect to metric name, resources region, dimensions, and its dimension value associated with the data. For Azure Core, an important task is to highlight faults or anomalies to the user on a dashboard that they can perceive easily. The number of anomalies reported should be highly significant and in a limited number, e.g., 5-20 anomalies reported per hour. The reported anomalies will have significant user perception and high reconstruction error in any time-series forecasting model. Hence, our task is to automatically identify 'high significant anomalies' and their associated information for user perception.