Machine Learning in Cyber-Security - Problems, Challenges and Data Sets
It addresses data scarcity and labeling issues in cyber-security for researchers, but is incremental as it focuses on dataset creation and a labeling technique.
The paper tackles the problem of applying machine learning to cyber-security by identifying key challenges and providing novel datasets to enable research, along with a method for generating labels to address the lack of labeled data.
We present cyber-security problems of high importance. We show that in order to solve these cyber-security problems, one must cope with certain machine learning challenges. We provide novel data sets representing the problems in order to enable the academic community to investigate the problems and suggest methods to cope with the challenges. We also present a method to generate labels via pivoting, providing a solution to common problems of lack of labels in cyber-security.