Command-line Obfuscation Detection using Small Language Models
This addresses a critical security challenge for organizations by providing a scalable alternative to signature-based detection, though it is incremental as it applies existing NLP techniques to a specific domain.
The paper tackles the problem of detecting command-line obfuscation used by adversaries to evade security systems, presenting an NLP-based method using a small transformer language model that achieves high-precision detections on real-world telemetry from diverse environments.
To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability forces most security solutions to create an exhaustive enumeration of signatures for even a single pattern. In contrast to using signatures, we have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model that can be applied to any source of execution logs. The evaluation on top of real-world telemetry demonstrates that our approach yields high-precision detections even on high-volume telemetry from a diverse set of environments spanning from universities and businesses to healthcare or finance. The practical value is demonstrated in a case study of real-world samples detected by our model. We show the model's superiority to signatures on established malware known to employ obfuscation and showcase previously unseen obfuscated samples detected by our model.