LG AI CL MLAug 27, 2018

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach

Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra

arXiv:1808.08773v349.41114 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of cross-lingual word representation for multilingual NLP applications, offering an incremental improvement over existing methods.

The paper tackles the problem of learning bilingual mappings from monolingual embeddings and a bilingual dictionary by proposing a geometric approach that decouples the transformation into rotations and a similarity metric in a common latent space, showing it outperforms previous methods on bilingual lexicon induction and cross-lingual word similarity tasks.

We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples learning the transformation from the source language to the target language into (a) learning rotations for language-specific embeddings to align them to a common space, and (b) learning a similarity metric in the common space to model similarities between the embeddings. We model the bilingual mapping problem as an optimization problem on smooth Riemannian manifolds. We show that our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We also generalize our framework to represent multiple languages in a common latent space. In particular, the latent space representations for several languages are learned jointly, given bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in zero-shot word translation setting. Our implementation is available at https://github.com/anoopkunchukuttan/geomm .

View on arXiv PDF Code

Similar