CLSep 15, 2018

CLUSE: Cross-Lingual Unsupervised Sense Embeddings

arXiv:1809.05694v21092 citations

AI Analysis

This work addresses the challenge of aligning word senses across languages for natural language processing applications, representing an incremental improvement with a new evaluation dataset.

The paper tackles the problem of learning cross-lingual sense embeddings by proposing a model that uses an English-Chinese parallel corpus to align bilingual sense representations, and it shows superior quality in evaluations on monolingual and bilingual datasets.

This paper proposes a modularized sense induction and representation learning model that jointly learns bilingual sense embeddings that align well in the vector space, where the cross-lingual signal in the English-Chinese parallel corpus is exploited to capture the collocation and distributed characteristics in the language pair. The model is evaluated on the Stanford Contextual Word Similarity (SCWS) dataset to ensure the quality of monolingual sense embeddings. In addition, we introduce Bilingual Contextual Word Similarity (BCWS), a large and high-quality dataset for evaluating cross-lingual sense embeddings, which is the first attempt of measuring whether the learned embeddings are indeed aligned well in the vector space. The proposed approach shows the superior quality of sense embeddings evaluated in both monolingual and bilingual spaces.

View on arXiv PDF

Similar