siForest: Detecting Network Anomalies with Set-Structured Isolation Forest
This work addresses the challenge of efficient anomaly detection in network traffic for cybersecurity systems, but it appears incremental as it builds on an existing method with specific modifications.
The paper tackles the problem of detecting anomalous network behavior in cybersecurity by proposing siForest, a novel extension of Isolation Forest for set-structured data, and demonstrates its potential to outperform traditional approaches on some types of internet scan data through experiments on synthetic datasets.
As cyber threats continue to evolve in sophistication and scale, the ability to detect anomalous network behavior has become critical for maintaining robust cybersecurity defenses. Modern cybersecurity systems face the overwhelming challenge of analyzing billions of daily network interactions to identify potential threats, making efficient and accurate anomaly detection algorithms crucial for network defense. This paper investigates the use of variations of the Isolation Forest (iForest) machine learning algorithm for detecting anomalies in internet scan data. In particular, it presents the Set-Partitioned Isolation Forest (siForest), a novel extension of the iForest method designed to detect anomalies in set-structured data. By treating instances such as sets of multiple network scans with the same IP address as cohesive units, siForest effectively addresses some challenges of analyzing complex, multidimensional datasets. Extensive experiments on synthetic datasets simulating diverse anomaly scenarios in network traffic demonstrate that siForest has the potential to outperform traditional approaches on some types of internet scan data.