Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning
This work addresses malware classification for cybersecurity, but it is incremental as it builds on existing methods with optimizations like parallel feature extraction and GPU acceleration.
The paper tackles the problem of classifying malware executables into known APT groups by automating and accelerating the process, achieving improved results through GPU-accelerated CNNs that overcome computational limitations of traditional models.
This paper presents an underlying framework for both automating and accelerating malware classification, more specifically, mapping malicious executables to known Advanced Persistent Threat (APT) groups. The main feature of this analysis is the assembly-level instructions present in executables which are also known as opcodes. The collection of such opcodes on many malicious samples is a lengthy process; hence, open-source reverse engineering tools are used in tandem with scripts that leverage parallel computing to analyze multiple files at once. Traditional and deep learning models are applied to create models capable of classifying malware samples. One-gram and two-gram datasets are constructed and used to train models such as SVM, KNN, and Decision Tree; however, they struggle to provide adequate results without relying on metadata to support n-gram sequences. The computational limitations of such models are overcome with convolutional neural networks (CNNs) and heavily accelerated using graphical compute unit (GPU) resources.