CLMar 9, 2022

Unsupervised Alignment of Distributional Word Embeddings

arXiv:2203.04863v2h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate cross-lingual alignment in NLP tasks like machine translation, though it is incremental as it extends existing methods to distributional embeddings.

The paper tackles the problem of unsupervised word translation by aligning probabilistic word embeddings, achieving better performance than point-vector methods on bilingual lexicon induction across multiple language pairs.

Cross-domain alignment play a key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have successfully been used to infer a bilingual lexicon without relying on supervision. However, current state-of-the art methods only focus on point vectors although distributional embeddings have proven to embed richer semantic information when representing words. In this paper, we propose stochastic optimization approach for aligning probabilistic embeddings. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. We show that the proposed approach achieves good performance on the bilingual lexicon induction task across several language pairs and performs better than the point-vector based approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes