Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders
This addresses the challenge of cross-lingual NLP for languages lacking parallel data, but it is incremental as it builds on existing adversarial and autoencoder methods.
The paper tackles the problem of learning cross-lingual word vector representations without parallel text by using an adversarial autoencoder to map monolingual word vectors between languages, presenting preliminary qualitative results.
Current approaches to learning vector representations of text that are compatible between different languages usually require some amount of parallel text, aligned at word, sentence or at least document level. We hypothesize however, that different natural languages share enough semantic structure that it should be possible, in principle, to learn compatible vector representations just by analyzing the monolingual distribution of words. In order to evaluate this hypothesis, we propose a scheme to map word vectors trained on a source language to vectors semantically compatible with word vectors trained on a target language using an adversarial autoencoder. We present preliminary qualitative results and discuss possible future developments of this technique, such as applications to cross-lingual sentence representations.