CL LGJun 15, 2017

A Survey Of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard

arXiv:1706.04902v4593 citations

Originality Synthesis-oriented

AI Analysis

It addresses the need for understanding and improving cross-lingual representations to facilitate natural language processing for low-resource languages, but it is incremental as it synthesizes existing literature without introducing new methods.

This survey provides a comprehensive typology of cross-lingual word embedding models, comparing their data requirements and objective functions, and finds that many models optimize for similar objectives with differences often due to optimization strategies and hyper-parameters.

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

View on arXiv PDF

Similar