CLMay 22, 2025

Does Synthetic Data Help Named Entity Recognition for Low-Resource Languages?

arXiv:2505.16814v39.64 citationsh-index: 20IJCNLP-AACL

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of limited labeled training data for NER in low-resource languages, which is an incremental improvement in data augmentation methods for NLP.

The paper tackled the problem of Named Entity Recognition (NER) for low-resource languages by exploring the use of synthetic data, finding that it holds promise but with significant variation across 11 diverse languages.

Named Entity Recognition(NER) for low-resource languages aims to produce robust systems for languages where there is limited labeled training data available, and has been an area of increasing interest within NLP. Data augmentation for increasing the amount of low-resource labeled data is a common practice. In this paper, we explore the role of synthetic data in the context of multilingual, low-resource NER, considering 11 languages from diverse language families. Our results suggest that synthetic data does in fact hold promise for low-resource language NER, though we see significant variation between languages.

View on arXiv PDF

Similar