Performance Metric for Multiple Anomaly Score Distributions with Discrete Severity Levels
This work addresses a specific need in smart factory maintenance for better performance metrics in anomaly severity classification, representing an incremental improvement in evaluation methods.
The paper tackles the problem of evaluating anomaly detection models that classify severity levels, proposing WS-AUROC as a new metric that combines AUROC with penalties for severity differences, and shows it outperforms ablation models in experiments.
The rise of smart factories has heightened the demand for automated maintenance, and normal-data-based anomaly detection has proved particularly effective in environments where anomaly data are scarce. This method, which does not require anomaly data during training, has prompted researchers to focus not only on detecting anomalies but also on classifying severity levels by using anomaly scores. However, the existing performance metrics, such as the area under the receiver operating characteristic curve (AUROC), do not effectively reflect the performance of models in classifying severity levels based on anomaly scores. To address this limitation, we propose the weighted sum of the area under the receiver operating characteristic curve (WS-AUROC), which combines AUROC with a penalty for severity level differences. We conducted various experiments using different penalty assignment methods: uniform penalty regardless of severity level differences, penalty based on severity level index differences, and penalty based on actual physical quantities that cause anomalies. The latter method was the most sensitive. Additionally, we propose an anomaly detector that achieves clear separation of distributions and outperforms the ablation models on the WS-AUROC and AUROC metrics.