CLMay 1, 2017

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

arXiv:1705.00424v175 citations

Originality Highly original

AI Analysis

This addresses the limitation of parallel data availability for many languages, enabling model transfer for low-resource language tagging.

The paper tackles the problem of tagging low-resource languages without parallel corpora by using a bilingual dictionary and cross-lingual word embeddings, resulting in substantial empirical improvements over baseline techniques and competitive benchmarks.

Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language and its associated annotated corpora. However, parallel data is not readily available for many languages, limiting the applicability of these approaches. We address these drawbacks in our framework which takes advantage of cross-lingual word embeddings trained solely on a high coverage bilingual dictionary. We propose a novel neural network model for joint training from both sources of data based on cross-lingual word embeddings, and show substantial empirical improvements over baseline techniques. We also propose several active learning heuristics, which result in improvements over competitive benchmark methods.

View on arXiv PDF

Similar