LGCVMar 28, 2022

Semi-supervised anomaly detection algorithm based on KL divergence (SAD-KL)

arXiv:2203.14539v18 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a distribution gap issue in anomaly detection for applications like fraud or fault monitoring, but it is incremental as it builds on existing LOF and KL divergence methods.

The paper tackles the problem of semi-supervised anomaly detection where unlabeled data may not match labeled normal data distribution, proposing SAD-KL to estimate KL divergence of LOFs and set thresholds iteratively, resulting in superior detection probability and less learning time compared to existing algorithms.

The unlabeled data are generally assumed to be normal data in detecting abnormal data via semisupervised learning. This assumption, however, causes inevitable detection error when distribution of unlabeled data is different from distribution of labeled normal dataset. To deal the problem caused by distribution gap between labeled and unlabeled data, we propose a semi-supervised anomaly detection algorithm using KL divergence (SAD-KL). The proposed SAD-KL is composed of two steps: (1) estimating KL divergence of probability density functions (PDFs) of the local outlier factors (LOFs) of the labeled normal data and the unlabeled data (2) estimating detection probability and threshold for detecting normal data in unlabeled data by using the KL divergence. We show that the PDFs of the LOFs follow Burr distribution and use them for detection. Once the threshold is computed, the SAD-KL runs iteratively until the labeling change rate is lower than the predefined threshold. Experiments results show that the SAD-KL shows superior detection probability over the existing algorithms even though it takes less learning time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes