CRLGAug 20, 2020

PicoDomain: A Compact High-Fidelity Cybersecurity Dataset

arXiv:2008.09192v16 citations
AI Analysis

This provides a practical tool for rapid validation and iterative development of cybersecurity analytics platforms, addressing a bottleneck for researchers and developers, though it is incremental as it builds on existing dataset creation efforts.

The paper tackles the lack of compact, high-fidelity cybersecurity datasets with ground truth by developing PicoDomain, a simulated Zeek log dataset from a realistic intrusion, validated using statistical analysis and off-the-shelf machine learning techniques.

Analysis of cyber relevant data has become an area of increasing focus. As larger percentages of businesses and governments begin to understand the implications of cyberattacks, the impetus for better cybersecurity solutions has increased. Unfortunately, current cybersecurity datasets either offer no ground truth or do so with anonymized data. The former leads to a quandary when verifying results and the latter can remove valuable information. Additionally, most existing datasets are large enough to make them unwieldy during prototype development. In this paper we have developed the PicoDomain dataset, a compact high-fidelity collection of Zeek logs from a realistic intrusion using relevant Tools, Techniques, and Procedures. While simulated on a small-scale network, this dataset consists of traffic typical of an enterprise network, which can be utilized for rapid validation and iterative development of analytics platforms. We have validated this dataset using traditional statistical analysis and off-the-shelf Machine Learning techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes