Meta-Learning for Automated Selection of Anomaly Detectors for Semi-Supervised Datasets
This work addresses the challenge of automating anomaly detector selection for practitioners in fields like cybersecurity or fraud detection, but it appears incremental as it builds on existing meta-learning and anomaly detection concepts.
The paper tackles the problem of selecting the best anomaly detector for semi-supervised datasets where only normal data is available during training, by using meta-learning to predict performance metrics like MCC based on computable meta-features, achieving promising initial results.
In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores based on metrics that can be computed with normal data only. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.