Automatic Mapping of Unstructured Cyber Threat Intelligence: An Experimental Study
This work addresses the need for automated processing of unstructured CTI to support proactive security efforts, but it is incremental as it builds on existing ML methods for classification tasks.
The study tackled the problem of automatically classifying unstructured cyber threat intelligence into attack techniques using machine learning, resulting in the creation of two new datasets and evaluation of various models to identify best-performing classifiers and error causes.
Proactive approaches to security, such as adversary emulation, leverage information about threat actors and their techniques (Cyber Threat Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e., natural language), such as incident reports and leaked documents. To support proactive security efforts, we present an experimental study on the automatic classification of unstructured CTI into attack techniques using machine learning (ML). We contribute with two new datasets for CTI analysis, and we evaluate several ML models, including both traditional and deep learning-based ones. We present several lessons learned about how ML can perform at this task, which classifiers perform best and under which conditions, which are the main causes of classification errors, and the challenges ahead for CTI analysis.