CLLGSep 10, 2021

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

arXiv:2109.05003v1673 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of noisy labels in distantly-supervised named entity recognition, which is important for applications requiring automated entity extraction without manual annotation, though it is incremental as it builds on existing methods.

The paper tackles the problem of training named entity recognition models using distantly-labeled data, which is often noisy and incomplete, by proposing a noise-robust learning scheme and a self-training method with language model augmentations, achieving superior performance and outperforming existing models by significant margins on three benchmark datasets.

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantly-supervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes