CR AINov 21, 2024

The importance of the clustering model to detect new types of intrusion in data traffic

Noor Saud Abd, Noor Walid Khalid, Basim Hussein Ali

arXiv:2411.14550v21 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of identifying emerging cyber threats without labeled data, but it is incremental as it uses an existing method on new data.

The paper tackled the problem of detecting new types of cyber intrusions in unclassified data traffic by applying the K-means clustering algorithm, which successfully identified and counted attacks in both custom-generated and public datasets.

In the current digital age, the volume of data generated by various cyber activities has become enormous and is constantly increasing. The data may contain valuable insights that can be harnessed to improve cyber security measures. However, much of this data is unclassified and qualitative, which poses significant challenges to traditional analysis methods. Clustering facilitates the identification of hidden patterns and structures in data through grouping similar data points, which makes it simpler to identify and address threats. Clustering can be defined as a data mining (DM) approach, which uses similarity calculations for dividing a data set into several categories. Hierarchical, density-based, along with partitioning clustering algorithms are typical. The presented work use K-means algorithm, which is a popular clustering technique. Utilizing K-means algorithm, we worked with two different types of data: first, we gathered data with the use of XG-boost algorithm following completing the aggregation with K-means algorithm. Data was gathered utilizing Kali Linux environment, cicflowmeter traffic, and Putty Software tools with the use of diverse and simple attacks. The concept could assist in identifying new attack types, which are distinct from the known attacks, and labeling them based on the characteristics they will exhibit, as the dynamic nature regarding cyber threats means that new attack types often emerge, for which labeled data might not yet exist. The model counted the attacks and assigned numbers to each one of them. Secondly, We tried the same work on the ready data inside the Kaggle repository called (Intrusion Detection in Internet of Things Network), and the clustering model worked well and detected the number of attacks correctly as shown in the results section.

View on arXiv PDF

Similar