AIFeb 9, 2024

On the Fly Detection of Root Causes from Observed Data with Application to IT Systems

arXiv:2402.06500v21 citationsh-index: 14CIKM
AI Analysis

This work addresses anomaly detection for IT system monitoring, presenting an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of detecting root causes of anomalies in IT systems by introducing a tailored structural causal model and a new algorithm, with experiments showing superior performance on both synthetic and real data.

This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes