CLLGDec 10, 2019

Machine Translation with Cross-lingual Word Embeddings

arXiv:1912.10167v2
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity issues in multilingual NLP applications, but it is incremental as it builds on existing word embedding methods.

The paper tackles the problem of missing data in machine translation by proposing cross-lingual word embeddings that create a single representation for language pairs, enabling classifiers from one language to be used when data is unavailable in another.

Learning word embeddings using distributional information is a task that has been studied by many researchers, and a lot of studies are reported in the literature. On the contrary, less studies were done for the case of multiple languages. The idea is to focus on a single representation for a pair of languages such that semantically similar words are closer to one another in the induced representation irrespective of the language. In this way, when data are missing for a particular language, classifiers from another language can be used.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes