Self-Supervised Vision Transformers for Malware Detection
This addresses the problem of detecting previously unseen malware for cybersecurity applications, representing an incremental improvement with specific gains.
The paper tackles malware detection by proposing SHERLOCK, a self-supervised Vision Transformer model that learns from image-based binary representations, achieving 97% accuracy in binary classification and outperforming state-of-the-art methods in multi-class classification with macro-F1 scores of 0.497 and 0.491.
Malware detection plays a crucial role in cyber-security with the increase in malware growth and advancements in cyber-attacks. Previously unseen malware which is not determined by security vendors are often used in these attacks and it is becoming inevitable to find a solution that can self-learn from unlabeled sample data. This paper presents SHERLOCK, a self-supervision based deep learning model to detect malware based on the Vision Transformer (ViT) architecture. SHERLOCK is a novel malware detection method which learns unique features to differentiate malware from benign programs with the use of image-based binary representation. Experimental results using 1.2 million Android applications across a hierarchy of 47 types and 696 families, shows that self-supervised learning can achieve an accuracy of 97% for the binary classification of malware which is higher than existing state-of-the-art techniques. Our proposed model is also able to outperform state-of-the-art techniques for multi-class malware classification of types and family with macro-F1 score of .497 and .491 respectively.