CLLGJun 12, 2019

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

arXiv:1906.05407v11117 citations
Originality Incremental advance
AI Analysis

This addresses a key problem in natural language processing for multilingual applications, but it is incremental as it builds on existing critiques of mapping approaches.

The paper investigates whether limitations in cross-lingual word embeddings are due to mapping methods or more general issues, finding that joint learning under ideal conditions yields more isomorphic embeddings and better results in bilingual lexicon induction.

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations. While several authors have questioned the underlying isomorphism assumption, which states that word embeddings in different languages have approximately the same structure, it is not clear whether this is an inherent limitation of mapping approaches or a more general issue when learning cross-lingual embeddings. So as to answer this question, we experiment with parallel corpora, which allows us to compare offline mapping to an extension of skip-gram that jointly learns both embedding spaces. We observe that, under these ideal conditions, joint learning yields to more isomorphic embeddings, is less sensitive to hubness, and obtains stronger results in bilingual lexicon induction. We thus conclude that current mapping methods do have strong limitations, calling for further research to jointly learn cross-lingual embeddings with a weaker cross-lingual signal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes