Microsoft Malware Classification Challenge
This provides a comparative analysis for researchers in cybersecurity to streamline future work, but it is incremental as it builds on existing dataset usage.
The paper tackles the need for a standardized benchmark in malware research by analyzing over 50 publications that cite the Microsoft Malware Classification Challenge dataset, which includes 0.5 terabytes of data from 20K malware samples, to identify research directions and evaluate dataset performance.
The Microsoft Malware Classification Challenge was announced in 2015 along with a publication of a huge dataset of nearly 0.5 terabytes, consisting of disassembly and bytecode of more than 20K malware samples. Apart from serving in the Kaggle competition, the dataset has become a standard benchmark for research on modeling malware behaviour. To date, the dataset has been cited in more than 50 research papers. Here we provide a high-level comparison of the publications citing the dataset. The comparison simplifies finding potential research directions in this field and future performance evaluation of the dataset.