Evaluating the Performance of Twitter-based Exploit Detectors
This work addresses patch prioritization for systems administrators by enhancing exploit detection through social media analysis, but it is incremental as it builds on existing methods with new data sources and ground-truth refinements.
The paper tackled the problem of detecting real-world exploits for patch prioritization by combining Twitter data with public databases to classify vulnerabilities as exploited or not-exploited, finding that LightGBM improves results and tweet/user statistics are more meaningful than tweet text.
Patch prioritization is a crucial aspect of information systems security, and knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators accomplish this task. The analysis of social media for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this paper, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.