Malware Task Identification: A Data Driven Approach
This addresses the time-consuming, human-driven process of malware analysis for cybersecurity analysts, though it appears incremental as it builds on existing data-driven methods.
The paper tackles the problem of automatically identifying tasks performed by malware, such as logging keystrokes or establishing remote access, and achieves an unbiased F1 score of over 0.9, outperforming current state-of-the-art software and standard machine learning approaches.
Identifying the tasks a given piece of malware was designed to perform (e.g. logging keystrokes, recording video, establishing remote access, etc.) is a difficult and time-consuming operation that is largely human-driven in practice. In this paper, we present an automated method to identify malware tasks. Using two different malware collections, we explore various circumstances for each - including cases where the training data differs significantly from test; where the malware being evaluated employs packing to thwart analytical techniques; and conditions with sparse training data. We find that this approach consistently out-performs the current state-of-the art software for malware task identification as well as standard machine learning approaches - often achieving an unbiased F1 score of over 0.9. In the near future, we look to deploy our approach for use by analysts in an operational cyber-security environment.