CLSep 9, 2019

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

Xiaolei Huang, Jonathan May, Nanyun Peng

arXiv:1909.03598v130.21008 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of named entity recognition for languages with limited training data, offering incremental insights into transfer mechanisms.

The paper tackles the problem of building named entity recognition models for low-resource languages by analyzing cross-lingual transfer, finding that sequential order and multilingual embeddings are key factors, with competitive performance achieved in experiments.

Building named entity recognition (NER) models for languages that do not have much training data is a challenging task. While recent work has shown promising results on cross-lingual transfer from high-resource languages to low-resource languages, it is unclear what knowledge is transferred. In this paper, we first propose a simple and efficient neural architecture for cross-lingual NER. Experiments show that our model achieves competitive performance with the state-of-the-art. We further analyze how transfer learning works for cross-lingual NER on two transferable factors: sequential order and multilingual embeddings, and investigate how model performance varies across entity lengths. Finally, we conduct a case-study on a non-Latin language, Bengali, which suggests that leveraging knowledge from Wikipedia will be a promising direction to further improve the model performances. Our results can shed light on future research for improving cross-lingual NER.

View on arXiv PDF

Similar