The Curious Case of Machine Learning In Malware Detection
This is an incremental review that highlights limitations in applying machine learning to malware detection, targeting cybersecurity researchers and practitioners.
The paper argues that current machine learning techniques are inadequate for real-world malware detection due to unique challenges posed by evolving malware, and it identifies three critical problems limiting their success while proposing requirements for next-generation solutions.
In this paper, we argue that machine learning techniques are not ready for malware detection in the wild. Given the current trend in malware development and the increase of unconventional malware attacks, we expect that dynamic malware analysis is the future for antimalware detection and prevention systems. A comprehensive review of machine learning for malware detection is presented. Then, we discuss how malware detection in the wild present unique challenges for the current state-of-the-art machine learning techniques. We defined three critical problems that limit the success of malware detectors powered by machine learning in the wild. Next, we discuss possible solutions to these challenges and present the requirements of next-generation malware detection. Finally, we outline potential research directions in machine learning for malware detection.