CLJun 29, 2022

Chinese Word Sense Embedding with SememeWSD and Synonym Set

arXiv:2206.14388v15 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of polysemy in Chinese word embeddings for NLP applications, but it is incremental as it builds on existing WSD and synonym-based methods.

The paper tackles the limitation of single-vector word embeddings for polysemous words by proposing the SWSDS model, which assigns distinct vectors to each sense using word sense disambiguation and synonym sets from OpenHowNet, achieving improved accuracy in semantic similarity calculations.

Word embedding is a fundamental natural language processing task which can learn feature of words. However, most word embedding methods assign only one vector to a word, even if polysemous words have multi-senses. To address this limitation, we propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words with the help of word sense disambiguation (WSD) and synonym set in OpenHowNet. We use the SememeWSD model, an unsupervised word sense disambiguation model based on OpenHowNet, to do word sense disambiguation and annotate the polysemous word with sense id. Then, we obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense. In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method. It achieves improvement of accuracy. We also examine the SememeWSD model on different BERT models to find the more effective model.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes