CRLGJan 30, 2023

Behavioural Reports of Multi-Stage Malware

arXiv:2301.12800v16 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This work provides a resource for enhancing host-based intrusion detection systems, particularly for anti-malware applications, but it is incremental as it builds on existing machine learning approaches with a new dataset and tagging method.

The authors tackled the problem of improving malware detection by creating a new dataset of API call sequences from malware samples executed in Windows 10, and they developed a multi-label classification system to tag sequences with multiple malicious behaviors, achieving results demonstrated through a benchmark.

The extensive damage caused by malware requires anti-malware systems to be constantly improved to prevent new threats. The current trend in malware detection is to employ machine learning models to aid in the classification process. We propose a new dataset with the objective of improving current anti-malware systems. The focus of this dataset is to improve host based intrusion detection systems by providing API call sequences for thousands of malware samples executed in Windows 10 virtual machines. A tutorial on how to create and expand this dataset is provided along with a benchmark demonstrating how to use this dataset to classify malware. The data contains long sequences of API calls for each sample, and in order to create models that can be deployed in resource constrained devices, three feature selection methods were tested. The principal innovation, however, lies in the multi-label classification system in which one sequence of APIs can be tagged with multiple labels describing its malicious behaviours.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes