A Survey of Network-based Intrusion Detection Data Sets
This work addresses the need for structured evaluation of data sets in network intrusion detection, but it is incremental as it synthesizes existing information without introducing new methods or data.
The paper conducted a survey of network-based intrusion detection data sets, identifying 15 properties across five categories to assess their suitability for evaluation scenarios, and provided a comprehensive overview and recommendations for use and creation.
Labeled data sets are necessary to train and evaluate anomaly-based network intrusion detection systems. This work provides a focused literature survey of data sets for network-based intrusion detection and describes the underlying packet- and flow-based network data in detail. The paper identifies 15 different properties to assess the suitability of individual data sets for specific evaluation scenarios. These properties cover a wide range of criteria and are grouped into five categories such as data volume or recording environment for offering a structured search. Based on these properties, a comprehensive overview of existing data sets is given. This overview also highlights the peculiarities of each data set. Furthermore, this work briefly touches upon other sources for network-based data such as traffic generators and traffic repositories. Finally, we discuss our observations and provide some recommendations for the use and creation of network-based data sets.