Analyzing and Storing Network Intrusion Detection Data using Bayesian Coresets: A Preliminary Study in Offline and Streaming Settings
This addresses the problem of handling large, redundant network traffic data for security analysts, but it is a preliminary study, indicating incremental progress in adapting existing methods to this domain.
The paper tackles the challenge of applying Bayesian machine learning to large-scale network intrusion detection data by using Bayesian coresets to reduce data samples while maintaining accurate posterior distributions, showing that this approach makes learning feasible and reduces memory and storage needs in both offline and streaming settings.
In this paper we offer a preliminary study of the application of Bayesian coresets to network security data. Network intrusion detection is a field that could take advantage of Bayesian machine learning in modelling uncertainty and managing streaming data; however, the large size of the data sets often hinders the use of Bayesian learning methods based on MCMC. Limiting the amount of useful data is a central problem in a field like network traffic analysis, where large amount of redundant data can be generated very quickly via packet collection. Reducing the number of samples would not only make learning more feasible, but would also contribute to reduce the need for memory and storage. We explore here the use of Bayesian coresets, a technique that reduces the amount of data samples while guaranteeing the learning of an accurate posterior distribution using Bayesian learning. We analyze how Bayesian coresets affect the accuracy of learned models, and how time-space requirements are traded-off, both in a static scenario and in a streaming scenario.