Machine Translation with Cross-lingual Word Embeddings
This addresses data scarcity issues in multilingual NLP applications, but it is incremental as it builds on existing word embedding methods.
The paper tackles the problem of missing data in machine translation by proposing cross-lingual word embeddings that create a single representation for language pairs, enabling classifiers from one language to be used when data is unavailable in another.
Learning word embeddings using distributional information is a task that has been studied by many researchers, and a lot of studies are reported in the literature. On the contrary, less studies were done for the case of multiple languages. The idea is to focus on a single representation for a pair of languages such that semantically similar words are closer to one another in the induced representation irrespective of the language. In this way, when data are missing for a particular language, classifiers from another language can be used.