CLDec 19, 2022

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Shashank Sonkar, Zichao Wang, Richard G. Baraniuk

arXiv:2212.09723v121.3222 citationsh-index: 108

Originality Incremental advance

AI Analysis

This addresses the problem of NER for low-resource languages, which is crucial for expanding NLP applications to underserved linguistic communities, though it is incremental as it builds on existing pre-trained models.

The paper tackles Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged samples by introducing MANER, a method that repurposes the <mask> token from pre-trained masked language models for NER prediction, resulting in improvements of up to 48% and an average of 12% in F1 score over state-of-the-art methods across 100 languages.

This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as English and French with abundant training data. However, NER for low-resource languages remains relatively unexplored. In this paper, we introduce Mask Augmented Named Entity Recognition (MANER), a new methodology that leverages the distributional hypothesis of pre-trained masked language models (MLMs) for NER. The <mask> token in pre-trained MLMs encodes valuable semantic contextual information. MANER re-purposes the <mask> token for NER prediction. Specifically, we prepend the <mask> token to every word in a sentence for which we would like to predict the named entity tag. During training, we jointly fine-tune the MLM and a new NER prediction head attached to each <mask> token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on state-of-the-art methods by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best-suited to MANER.

View on arXiv PDF

Similar