CLJun 6, 2017

Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic Similarity

Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera

arXiv:1706.01967v266 citations

AI Analysis

This work addresses the need for more accurate semantic similarity in domain-specific NLP applications, though it is incremental as it builds on existing methods.

The paper tackled the problem of poor performance of general semantic similarity measures in specific domains by introducing a domain-specific measure combining word2vec and lexicon-based methods, achieving better performance than generic or domain-specific word embedding methods without lexical augmentation and showing that lemmatization improves word embedding performance.

Semantic similarity measures are an important part in Natural Language Processing tasks. However Semantic similarity measures built for general use do not perform well within specific domains. Therefore in this study we introduce a domain specific semantic similarity measure that was created by the synergistic union of word2vec, a word embedding method that is used for semantic similarity calculation and lexicon based (lexical) semantic similarity methods. We prove that this proposed methodology out performs word embedding methods trained on generic corpus and methods trained on domain specific corpus but do not use lexical semantic similarity methods to augment the results. Further, we prove that text lemmatization can improve the performance of word embedding methods.

View on arXiv PDF

Similar