CLMay 17, 2020

Cross-Lingual Word Embeddings for Turkic Languages

arXiv:2005.08340v1997 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the low-resource constraint for Turkic languages, enabling knowledge transfer from resource-rich languages, but it is incremental as it applies existing methods to new data.

The paper tackled the problem of learning cross-lingual word embeddings for low-resource Turkic languages (Turkish, Uzbek, Azeri, Kazakh, Kyrgyz) by applying established alignment techniques with new bilingual dictionaries, resulting in improved bilingual dictionary induction and slight benefits in sentiment analysis for Uzbek.

There has been an increasing interest in learning cross-lingual word embeddings to transfer knowledge obtained from a resource-rich language, such as English, to lower-resource languages for which annotated data is scarce, such as Turkish, Russian, and many others. In this paper, we present the first viability study of established techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh and Kyrgyz, members of the Turkic family which is heavily affected by the low-resource constraint. Those techniques are known to require little explicit supervision, mainly in the form of bilingual dictionaries, hence being easily adaptable to different domains, including low-resource ones. We obtain new bilingual dictionaries and new word embeddings for these languages and show the steps for obtaining cross-lingual word embeddings using state-of-the-art techniques. Then, we evaluate the results using the bilingual dictionary induction task. Our experiments confirm that the obtained bilingual dictionaries outperform previously-available ones, and that word embeddings from a low-resource language can benefit from resource-rich closely-related languages when they are aligned together. Furthermore, evaluation on an extrinsic task (Sentiment analysis on Uzbek) proves that monolingual word embeddings can, although slightly, benefit from cross-lingual alignments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes