CRLGAug 12, 2022

On deceiving malware classification with section injection

arXiv:2208.06092v17 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work addresses vulnerabilities in malware classification for cybersecurity, showing that current systems may be less trustworthy than reported, with incremental improvements in robustness through data augmentation.

The authors tackled the problem of deceiving malware classification systems by injecting random bytes into executable files, which caused a 25-40% accuracy drop in family classification with only a 7% increase in file size.

We investigate how to modify executable files to deceive malware classification systems. This work's main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five state-of-the-art malware classification approaches to evaluate our injection scheme: one based on GIST+KNN, three CNN variations and one Gated CNN. We performed our experiments on a public dataset with 9,339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that a automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malwares alongside the original ones to increase networks robustness against mentioned attacks. Results show that a combination of reordering malware sections and injecting random data can improve overall performance of the classification. Code available at https://github.com/adeilsonsilva/malware-injection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes