Towards Dark Jargon Interpretation in Underground Forums
This addresses the challenge of interpreting hidden illicit language for law enforcement or security analysts, but it appears incremental as it builds on existing methods.
The paper tackled the problem of automatically identifying and interpreting dark jargons in underground forums, formalizing it as mapping dark words to clean words, and showed effectiveness with outperformance on simulated data and detection in a real-world dataset.
Dark jargons are benign-looking words that have hidden, sinister meanings and are used by participants of underground forums for illicit behavior. For example, the dark term "rat" is often used in lieu of "Remote Access Trojan". In this work we present a novel method towards automatically identifying and interpreting dark jargons. We formalize the problem as a mapping from dark words to "clean" words with no hidden meaning. Our method makes use of interpretable representations of dark and clean words in the form of probability distributions over a shared vocabulary. In our experiments we show our method to be effective in terms of dark jargon identification, as it outperforms another related method on simulated data. Using manual evaluation, we show that our method is able to detect dark jargons in a real-world underground forum dataset.