CLMay 14, 2019

The Language of Legal and Illegal Activity on the Darknet

arXiv:1905.05543v21097 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of monitoring Darknet activity for law enforcement and researchers, but it is incremental as it builds on existing NLP tools by applying them to a new domain.

The paper investigated the linguistic characteristics of legal and illegal text on the Darknet, focusing on drug-related websites, and found that texts for selling legal and illegal drugs have distinct features such as POS tag distributions and named entity coverage in Wikipedia.

The non-indexed parts of the Internet (the Darknet) have become a haven for both legal and illegal anonymous activity. Given the magnitude of these networks, scalably monitoring their activity necessarily relies on automated tools, and notably on NLP tools. However, little is known about what characteristics texts communicated through the Darknet have, and how well off-the-shelf NLP tools do on this domain. This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition. Taking drug-related websites as a test case, we find that texts for selling legal and illegal drugs have several linguistic characteristics that distinguish them from one another, as well as from the control condition, among them the distribution of POS tags, and the coverage of their named entities in Wikipedia.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes