CLOct 14, 2019

Mapping Supervised Bilingual Word Embeddings from English to low-resource languages

arXiv:1910.06411v10.2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of natural language processing for low-resource languages, though it is incremental as it applies an existing mapping method to new languages.

The paper tackled the problem of learning bilingual word embeddings for low-resource languages by mapping English embeddings to languages like Estonian and Hungarian using a supervised approach, reporting accuracy scores that suggest feasibility for tasks like machine translation with some bilingual data.

It is very challenging to work with low-resource languages due to the inadequate availability of data. Using a dictionary to map independently trained word embeddings into a shared vector space has proved to be very useful in learning bilingual embeddings in the past. Here we have tried to map individual embeddings of words in English and their corresponding translated words in low-resource languages like Estonian, Slovenian, Slovakian, and Hungarian. We have used a supervised learning approach. We report accuracy scores through various retrieval strategies which show that it is possible to approach challenging tasks in Natural Language Processing like machine translation for such languages, provided that we have at least some amount of proper bilingual data. We also conclude that we can follow an unsupervised learning path on monolingual text data as that is more suitable for low-resource languages.

View on arXiv PDF Code

Similar