CRMay 6, 2019

A Benchmark API Call Dataset for Windows PE Malware Classification

arXiv:1905.01999v265 citations
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark dataset for researchers working on malware detection, though it is incremental as it focuses on data creation rather than novel methods.

The authors tackled the problem of classifying Windows PE malware by creating a dataset of 7107 malicious software samples from various families, recorded via API calls in a sandbox environment, and formatted for use with different classification algorithms.

The use of operating system API calls is a promising task in the detection of PE-type malware in the Windows operating system. This task is officially defined as running malware in an isolated sandbox environment, recording the API calls made with the Windows operating system and sequentially analyzing these calls. Here, we have analyzed 7107 different malicious software belonging to various families such as virus, backdoor, trojan in an isolated sandbox environment and transformed these analysis results into a format where different classification algorithms and methods can be used. First, we'll explain how we got the malware, and then we'll explain how we've got these software bundled into families. Finally, we will describe how to perform malware classification tasks using different computational methods for the researchers who will use the data set we have created.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes