LGMay 31, 2023

Quality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR'16

arXiv:2305.19770v22 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the issue of dataset biases and sensitivity for researchers and practitioners in network anomaly detection, revealing that common benchmarking practices can be misleading, making it an incremental contribution to methodology.

The paper tackles the problem of evaluating anomaly detection models by showing that minor modifications to the UGR'16 dataset cause more impact on performance than the choice of ML technique, with performance variations up to 20% in some cases, and highlights labeling inaccuracies that introduce uncertainty.

Autonomous or self-driving networks are expected to provide a solution to the myriad of extremely demanding new applications with minimal human supervision. For this purpose, the community relies on the development of new Machine Learning (ML) models and techniques. %, like the celebrated Deep Learning (DL). However, ML can only be as good as the data it is fitted with, and data quality is an elusive concept difficult to assess. In this paper, we show that relatively minor modifications on a benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific ML technique considered. We also show that the measured model performance is uncertain, as a result of labelling inaccuracies. Our findings illustrate that the widely adopted approach of comparing a set of models in terms of performance results (e.g., in terms of accuracy or ROC curves) may lead to incorrect conclusions when done without a proper understanding of dataset biases and sensitivity. We contribute a methodology to interpret a model response that can be useful for this understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes