Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings
This work addresses the problem of cross-lingual word embedding alignment for NLP applications, offering a novel geometric approach that is particularly effective for distant languages.
The paper tackles unsupervised alignment of word embeddings between languages by formulating it as a domain adaptation problem on a manifold of doubly stochastic matrices, resulting in outperforming state-of-the-art optimal transport methods on bilingual lexicon induction tasks, with significant improvements for distant language pairs.
We propose a novel manifold based geometric approach for learning unsupervised alignment of word embeddings between the source and the target languages. Our approach formulates the alignment learning problem as a domain adaptation problem over the manifold of doubly stochastic matrices. This viewpoint arises from the aim to align the second order information of the two language spaces. The rich geometry of the doubly stochastic manifold allows to employ efficient Riemannian conjugate gradient algorithm for the proposed formulation. Empirically, the proposed approach outperforms state-of-the-art optimal transport based approach on the bilingual lexicon induction task across several language pairs. The performance improvement is more significant for distant language pairs.