CLAILGMLAug 31, 2019

Entity Projection via Machine Translation for Cross-Lingual NER

arXiv:1909.05356v21011 citations
Originality Incremental advance
AI Analysis

This work addresses the lack of annotated corpora for many languages in NER, offering a practical solution for cross-lingual applications.

The paper tackles the problem of cross-lingual named entity recognition (NER) by improving annotation-projection methods using machine translation, achieving an average improvement of 4.1 points over state-of-the-art methods on 5 languages and outperforming a monolingual model for Armenian.

Although over 100 languages are supported by strong off-the-shelf machine translation systems, only a subset of them possess large annotated corpora for named entity recognition. Motivated by this fact, we leverage machine translation to improve annotation-projection approaches to cross-lingual named entity recognition. We propose a system that improves over prior entity-projection methods by: (a) leveraging machine translation systems twice: first for translating sentences and subsequently for translating entities; (b) matching entities based on orthographic and phonetic similarity; and (c) identifying matches based on distributional statistics derived from the dataset. Our approach improves upon current state-of-the-art methods for cross-lingual named entity recognition on 5 diverse languages by an average of 4.1 points. Further, our method achieves state-of-the-art F_1 scores for Armenian, outperforming even a monolingual model trained on Armenian source data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes