CLApr 20, 2019

Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings

arXiv:1904.09446v1996 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of aligning word embeddings across languages without high-quality parallel data, particularly for typologically distant language pairs, though it appears incremental.

The paper tackles the problem of poor performance in cross-lingual word embeddings for distant languages by proposing a weakly-supervised concept-based adversarial training method, which improves performance over previous unsupervised methods.

Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that align two sets of monolingual word embeddings together without high quality parallel data such as a dictionary or a sentence-aligned corpus. However, without post refinement, the performance of these methods' preliminary mapping is not good, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which for most languages improves the performance of previous unsupervised adversarial methods, especially for typologically distant language pairs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes