CLDec 19, 2017

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

arXiv:1712.06961v21107 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating bilingual dictionaries for language pairs lacking supervised resources, which is incremental as it builds on existing embedding techniques.

The paper tackles the problem of bilingual dictionary induction without relying on prior alignments like parallel corpora or seed dictionaries, proposing an unsupervised method that uses structural similarities in monolingual embeddings to align languages, and shows empirically that its performance is comparable to supervised methods.

Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector spaces to align them such that similar words are mapped to each other. We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes