CLApr 20, 2019

Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings

Haozhou Wang, James Henderson, Paola Merlo

arXiv:1904.09446v130.0996 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of aligning word embeddings across languages without high-quality parallel data, particularly for typologically distant language pairs, though it appears incremental.

The paper tackles the problem of poor performance in cross-lingual word embeddings for distant languages by proposing a weakly-supervised concept-based adversarial training method, which improves performance over previous unsupervised methods.

Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that align two sets of monolingual word embeddings together without high quality parallel data such as a dictionary or a sentence-aligned corpus. However, without post refinement, the performance of these methods' preliminary mapping is not good, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which for most languages improves the performance of previous unsupervised adversarial methods, especially for typologically distant language pairs.

View on arXiv PDF

Similar