Robustness of ML-Enhanced IDS to Stealthy Adversaries
This addresses robustness concerns for cybersecurity practitioners deploying ML-based intrusion detection systems in real-world environments where clean training data is unavailable.
The paper tackles the problem of training ML-enhanced intrusion detection systems with poisoned data containing malicious activity, demonstrating that autoencoder-based anomaly detection systems remain robust to such poisoning.
Intrusion Detection Systems (IDS) enhanced with Machine Learning (ML) have demonstrated the capacity to efficiently build a prototype of "normal" cyber behaviors in order to detect cyber threats' activity with greater accuracy than traditional rule-based IDS. Because these are largely black boxes, their acceptance requires proof of robustness to stealthy adversaries. Since it is impossible to build a baseline from activity completely clean of that of malicious cyber actors (outside of controlled experiments), the training data for deployed models will be poisoned with examples of activity that analysts would want to be alerted about. We train an autoencoder-based anomaly detection system on network activity with various proportions of malicious activity mixed in and demonstrate that they are robust to this sort of poisoning.