AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
This addresses the challenge of training NER models with incompletely annotated data, which is common when annotators lack domain knowledge, and it is incremental as it builds on existing methods to handle multi-labeled tokens.
The paper tackles the problem of Named Entity Recognition (NER) with incomplete annotations, where only a fraction of entities are labeled, by proposing AdaK-NER, an adaptive top-K approach that helps models focus on a smaller feasible region, resulting in average improvements of 2% in F-score on CoNLL-2003 and over 10% on two Chinese datasets compared to prior state-of-the-art.
State-of-the-art Named Entity Recognition(NER) models rely heavily on large amountsof fully annotated training data. However, ac-cessible data are often incompletely annotatedsince the annotators usually lack comprehen-sive knowledge in the target domain. Normallythe unannotated tokens are regarded as non-entities by default, while we underline thatthese tokens could either be non-entities orpart of any entity. Here, we study NER mod-eling with incomplete annotated data whereonly a fraction of the named entities are la-beled, and the unlabeled tokens are equiva-lently multi-labeled by every possible label.Taking multi-labeled tokens into account, thenumerous possible paths can distract the train-ing model from the gold path (ground truthlabel sequence), and thus hinders the learn-ing ability. In this paper, we propose AdaK-NER, named the adaptive top-Kapproach, tohelp the model focus on a smaller feasible re-gion where the gold path is more likely to belocated. We demonstrate the superiority ofour approach through extensive experimentson both English and Chinese datasets, aver-agely improving 2% in F-score on the CoNLL-2003 and over 10% on two Chinese datasetscompared with the prior state-of-the-art works.