Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice
This work addresses the practical challenge of selecting appropriate IDS for users, but it is incremental as it provides empirical comparisons without introducing new methods.
The paper empirically compares recent intrusion detection systems (IDS) to help users choose solutions based on requirements, finding that no single IDS is universally best, with performance depending on factors like attack types and datasets, such as a deep neural network achieving the highest average F1 scores but not always being top-performing.
Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere IoT datasets both capture IoT-related attacks, but the deep neural network performed the best when tested using the BoT_IoT dataset while HELAD performed the best when tested using the Stratosphere IoT dataset. So although we found that a deep neural network solution had the highest average F1 scores on tested datasets, it is not always the best-performing one. We further discuss difficulties in using IDS from literature and project repositories, which complicated drawing definitive conclusions regarding IDS selection.