CLLGMay 18, 2025

Towards DS-NER: Unveiling and Addressing Latent Noise in Distant Annotations

arXiv:2505.12454v11 citationsh-index: 11Has CodeIEEE Trans Knowl Data Eng
Originality Incremental advance
AI Analysis

This addresses noise issues in DS-NER for researchers and practitioners using distant annotations, but it is incremental as it builds on existing noise measurement methods.

The paper tackled the problem of latent noise in distantly supervised named entity recognition (DS-NER) by introducing a framework that categorizes noise into unlabeled-entity and noisy-entity problems, providing specialized solutions. It achieved significant improvements on eight real-world datasets, confirming superiority over state-of-the-art methods.

Distantly supervised named entity recognition (DS-NER) has emerged as a cheap and convenient alternative to traditional human annotation methods, enabling the automatic generation of training data by aligning text with external resources. Despite the many efforts in noise measurement methods, few works focus on the latent noise distribution between different distant annotation methods. In this work, we explore the effectiveness and robustness of DS-NER by two aspects: (1) distant annotation techniques, which encompasses both traditional rule-based methods and the innovative large language model supervision approach, and (2) noise assessment, for which we introduce a novel framework. This framework addresses the challenges by distinctly categorizing them into the unlabeled-entity problem (UEP) and the noisy-entity problem (NEP), subsequently providing specialized solutions for each. Our proposed method achieves significant improvements on eight real-world distant supervision datasets originating from three different data sources and involving four distinct annotation techniques, confirming its superiority over current state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes