CLOct 24, 2019

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

arXiv:1910.10893v1997 citations
Originality Incremental advance
AI Analysis

This addresses the problem of low-resource sequence labeling for languages lacking parallel data, offering a practical solution for NLP applications in diverse linguistic contexts, though it builds incrementally on existing multilingual representation methods.

The paper tackles cross-lingual sequence labeling without bilingual resources by proposing MLMA, a multilingual language model with deep semantic alignment, achieving new state-of-the-art results in NER and POS tasks across European languages and effective performance on distant language pairs like English-Chinese.

Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-byword matching. Such requirements and assumptions are infeasible for most languages, especially for languages with large linguistic distances, e.g., English and Chinese. In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate language-independent representations for cross-lingual sequence labeling. Our methods require only monolingual corpora with no bilingual resources at all and take advantage of deep contextualized representations. Experimental results show that our approach achieves new state-of-the-art NER and POS performance across European languages, and is also effective on distant language pairs such as English and Chinese.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes