CLAILGJun 4, 2019

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

arXiv:1906.01622v363 citations
Originality Highly original
AI Analysis

This addresses the challenge of cross-lingual NLP for language pairs with non-isomorphic embeddings, representing a strong specific gain rather than a broad breakthrough.

The paper tackled the problem of aligning non-isomorphic cross-lingual word embeddings by proposing Iterative Normalization, which transforms monolingual embeddings to facilitate orthogonal alignment, resulting in a significant improvement in word translation accuracy from 2% to 44% for English-Japanese.

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes