CRMay 11, 2018

Under the Underground: Predicting Private Interactions in Underground Forums

arXiv:1805.04494v124 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited access to private cybercriminal communications for analysts, offering a tool to infer hidden activities, but it is incremental as it builds on existing forum analysis with a new predictive approach.

The paper tackles the problem of predicting private interactions in underground forums, where such messages are rarely available, by proposing a supervised machine learning method that uses public threads to forecast private activity after a partial leak, achieving results that show public information can predict private interactions, though models do not transfer well between forums.

Underground forums where users discuss, buy, and sell illicit services and goods facilitate a better understanding of the economy and organization of cybercriminals. Prior work has shown that in particular private interactions provide a wealth of information about the cybercriminal ecosystem. Yet, those messages are seldom available to analysts, except when there is a leak. To address this problem we propose a supervised machine learning based method able to predict which public \threads will generate private messages, after a partial leak of such messages has occurred. To the best of our knowledge, we are the first to develop a solution to overcome the barrier posed by limited to no information on private activity for underground forum analysis. Additionally, we propose an automate method for labeling posts, significantly reducing the cost of our approach in the presence of real unlabeled data. This method can be tuned to focus on the likelihood of users receiving private messages, or \threads triggering private interactions. We evaluate the performance of our methods using data from three real forum leaks. Our results show that public information can indeed be used to predict private activity, although prediction models do not transfer well between forums. We also find that neither the length of the leak period nor the time between the leak and the prediction have significant impact on our technique's performance, and that NLP features dominate the prediction power.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes