Machine Learning for Network-based Intrusion Detection Systems: an Analysis of the CIDDS-001 Dataset
This work addresses the need for effective intrusion detection systems to protect digital networks, but it is incremental as it explores an underutilized label in an existing dataset.
The study compared K-Nearest Neighbours and Random Forest models trained on the CIDDS-001 dataset using the Class label versus the AttackType label for network intrusion detection, finding that AttackType produced reliable results similar to Class.
With the increasing amount of reliance on digital data and computer networks by corporations and the public in general, the occurrence of cyber attacks has become a great threat to the normal functioning of our society. Intrusion detection systems seek to address this threat by preemptively detecting attacks in real time while attempting to block them or minimizing their damage. These systems can function in many ways being some of them based on artificial intelligence methods. Datasets containing both normal network traffic and cyber attacks are used for training these algorithms so that they can learn the underlying patterns of network-based data. The CIDDS-001 is one of the most used datasets for network-based intrusion detection research. Regarding this dataset, in the majority of works published so far, the Class label was used for training machine learning algorithms. However, there is another label in the CIDDS-001, AttackType, that seems very promising for this purpose and remains considerably unexplored. This work seeks to make a comparison between two machine learning models, K-Nearest Neighbours and Random Forest, which were trained with both these labels in order to ascertain whether AttackType can produce reliable results in comparison with the Class label.