CLAug 17, 2023

Task Relation Distillation and Prototypical Pseudo Label for Incremental Named Entity Recognition

arXiv:2308.08793v119 citationsh-index: 23Has Code
Originality Highly original
AI Analysis

This addresses the challenge of sequentially learning new entity types without forgetting old ones in natural language processing, representing a strong incremental improvement.

The paper tackles the problem of catastrophic forgetting and background shift in incremental named entity recognition (INER) by proposing a method called task Relation Distillation and Prototypical pseudo label (RDP), which achieves an average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score over previous state-of-the-art methods.

Incremental Named Entity Recognition (INER) involves the sequential learning of new entity types without accessing the training data of previously learned types. However, INER faces the challenge of catastrophic forgetting specific for incremental learning, further aggravated by background shift (i.e., old and future entity types are labeled as the non-entity type in the current task). To address these challenges, we propose a method called task Relation Distillation and Prototypical pseudo label (RDP) for INER. Specifically, to tackle catastrophic forgetting, we introduce a task relation distillation scheme that serves two purposes: 1) ensuring inter-task semantic consistency across different incremental learning tasks by minimizing inter-task relation distillation loss, and 2) enhancing the model's prediction confidence by minimizing intra-task self-entropy loss. Simultaneously, to mitigate background shift, we develop a prototypical pseudo label strategy that distinguishes old entity types from the current non-entity type using the old model. This strategy generates high-quality pseudo labels by measuring the distances between token embeddings and type-wise prototypes. We conducted extensive experiments on ten INER settings of three benchmark datasets (i.e., CoNLL2003, I2B2, and OntoNotes5). The results demonstrate that our method achieves significant improvements over the previous state-of-the-art methods, with an average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes