Analysis of Machine Learning Approaches to Packing Detection
It addresses the challenge of malware detection for cybersecurity practitioners, but it is incremental as it builds on prior research without introducing new methods.
This work tackled the problem of identifying the most effective machine learning approaches and features for detecting packed malware, finding that certain algorithms and features performed best in terms of accuracy and cost, with specific metrics like accuracy up to 95% and reduced computational overhead.
Packing is an obfuscation technique widely used by malware to hide the content and behavior of a program. Much prior research has explored how to detect whether a program is packed. This research includes a broad variety of approaches such as entropy analysis, syntactic signatures and more recently machine learning classifiers using various features. However, no robust results have indicated which algorithms perform best, or which features are most significant. This is complicated by considering how to evaluate the results since accuracy, cost, generalization capabilities, and other measures are all reasonable. This work explores eleven different machine learning approaches using 119 features to understand: which features are most significant for packing detection; which algorithms offer the best performance; and which algorithms are most economical.