CL LGFeb 21, 2020

Refinement of Unsupervised Cross-Lingual Word Embeddings

Magdalena Biesialska, Marta R. Costa-jussà

arXiv:2002.09213v10.35 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving cross-lingual embeddings for low-resource and morphologically-rich languages, representing an incremental advancement over existing projection-based approaches.

The paper tackles the problem of aligning cross-lingual word embeddings for morphologically-rich languages by proposing a self-supervised refinement method that moves word vectors and translations closer while enforcing invariance, resulting in outperforming state-of-the-art methods in bilingual lexicon induction tasks.

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.

View on arXiv PDF Code

Similar