CLApr 14, 2024

Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields

ETH Zurich

arXiv:2404.09383v151.51115 citationsh-index: 45IJCNLP

Originality Incremental advance

AI Analysis

It addresses the problem of limited annotated data for NER in many languages, which is incremental as it builds on existing transfer learning and neural methods.

The paper tackles low-resource named entity recognition by proposing a cross-lingual, character-level neural CRF method that transfers learning from high-resource to low-resource languages, achieving an improvement of up to 9.8 F1 points over a baseline.

Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world's languages, it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly. Learning character representations for multiple related languages allows transfer among the languages, improving F1 by up to 9.8 points over a loglinear CRF baseline.

View on arXiv PDF

Similar