CLCVMay 18, 2020

Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature

arXiv:2005.08746v12 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of detecting named entities in Tor Darknet texts for Law Enforcement Agencies, representing an incremental improvement over existing methods.

The paper tackled Named Entity Recognition in noisy user-generated texts by introducing a Local Distance Neighbor feature to replace expensive gazetteers, achieving state-of-the-art results on the W-NUT-2017 dataset and F1 scores of 52.96% and 50.57% on an extended dataset for Tor Darknet entities.

Name entity recognition in noisy user-generated texts is a difficult task usually enhanced by incorporating an external resource of information, such as gazetteers. However, gazetteers are task-specific, and they are expensive to build and maintain. This paper adopts and improves the approach of Aguilar et al. by presenting a novel feature, called Local Distance Neighbor, which substitutes gazetteers. We tested the new approach on the W-NUT-2017 dataset, obtaining state-of-the-art results for the Group, Person and Product categories of Named Entities. Next, we added 851 manually labeled samples to the W-NUT-2017 dataset to account for named entities in the Tor Darknet related to weapons and drug selling. Finally, our proposal achieved an entity and surface F1 scores of 52.96% and 50.57% on this extended dataset, demonstrating its usefulness for Law Enforcement Agencies to detect named entities in the Tor hidden services.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes