Analyzing the Usefulness of the DARPA OpTC Dataset in Cyber Threat Detection Research
This work addresses the need for modern datasets in cybersecurity research to combat sophisticated attacks like APTs, though it is incremental as it focuses on dataset evaluation rather than new detection methods.
The paper analyzes the DARPA OpTC dataset for cyber threat detection, finding it an excellent candidate for research due to its reflection of real-world enterprise scenarios, while also noting limitations.
Maintaining security and privacy in real-world enterprise networks is becoming more and more challenging. Cyber actors are increasingly employing previously unreported and state-of-the-art techniques to break into corporate networks. To develop novel and effective methods to thwart these sophisticated cyberattacks, we need datasets that reflect real-world enterprise scenarios to a high degree of accuracy. However, precious few such datasets are publicly available. Researchers still predominantly use the decade-old KDD datasets, however, studies showed that these datasets do not adequately reflect modern attacks like Advanced Persistent Threats(APT). In this work, we analyze the usefulness of the recently introduced DARPA Operationally Transparent Cyber (OpTC) dataset in this regard. We describe the content of the dataset in detail and present a qualitative analysis. We show that the OpTC dataset is an excellent candidate for advanced cyber threat detection research while also highlighting its limitations. Additionally, we propose several research directions where this dataset can be useful.