Estimating Text Similarity based on Semantic Concept Embeddings
This work addresses a specific issue in natural language processing for applications like marketing, but it appears incremental as it builds on existing methods.
The paper tackled the problem of Word2Vec embeddings being inadequate for representing human thought processes and ambiguous words by proposing Semantic Concept Embeddings based on the MultiNet Semantic Network formalism. The result was an increase in accuracy for predicting marketing target groups when combining traditional word embeddings with semantic concept embeddings.
Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.