CLFeb 25, 2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

arXiv:2102.13129v223 citations
AI Analysis

This addresses the challenge of obtaining labeled data for low-resource NER, making distant supervision more accessible and effective, though it is incremental as it builds on existing distant supervision methods.

The paper tackles the problem of low-resource named entity recognition by introducing ANEA, a tool for distant supervision using entity lists, which increased F1-scores by an average of 18 points in six low-resource scenarios.

Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to gather. In this work, we present ANEA, a tool to automatically annotate named entities in texts based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without labelling or checking all tokens manually. In six low-resource scenarios, we show that the F1-score can be increased by on average 18 points through distantly supervised data obtained by ANEA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes