Impact of Inaccurate Contamination Ratio on Robust Unsupervised Anomaly Detection
This addresses a practical issue for users of anomaly detection systems where data contamination is common, but the findings are incremental as they build on existing robust models.
The study examined how inaccurate contamination ratios affect robust unsupervised anomaly detection models, finding that these models are not harmed and can even perform better when given incorrect contamination information, as shown on six benchmark datasets.
Training data sets intended for unsupervised anomaly detection, typically presumed to be anomaly-free, often contain anomalies (or contamination), a challenge that significantly undermines model performance. Most robust unsupervised anomaly detection models rely on contamination ratio information to tackle contamination. However, in reality, contamination ratio may be inaccurate. We investigate on the impact of inaccurate contamination ratio information in robust unsupervised anomaly detection. We verify whether they are resilient to misinformed contamination ratios. Our investigation on 6 benchmark data sets reveals that such models are not adversely affected by exposure to misinformation. In fact, they can exhibit improved performance when provided with such inaccurate contamination ratios.