ANEA: Distant Supervision for Low-Resource Named Entity Recognition
This addresses the challenge of obtaining labeled data for low-resource NER, making distant supervision more accessible and effective, though it is incremental as it builds on existing distant supervision methods.
The paper tackles the problem of low-resource named entity recognition by introducing ANEA, a tool for distant supervision using entity lists, which increased F1-scores by an average of 18 points in six low-resource scenarios.
Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to gather. In this work, we present ANEA, a tool to automatically annotate named entities in texts based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without labelling or checking all tokens manually. In six low-resource scenarios, we show that the F1-score can be increased by on average 18 points through distantly supervised data obtained by ANEA.