Calculated attributes of synonym sets
This work addresses the gap between theoretical linguistics and empirical computational methods for synonym analysis, but it appears incremental as it builds on existing word embedding techniques without demonstrating major breakthroughs.
The paper tackles the problem of formalizing synonym sets by proposing a geometric approach using word embeddings to model synsets and introducing characteristics like interior, rank, and centrality to identify the most significant words. Experiments were conducted using RusVectores resources, but no concrete numerical results are provided in the abstract.
The goal of formalization, proposed in this paper, is to bring together, as near as possible, the theoretic linguistic problem of synonym conception and the computer linguistic methods based generally on empirical intuitive unjustified factors. Using the word vector representation we have proposed the geometric approach to mathematical modeling of synonym set (synset). The word embedding is based on the neural networks (Skip-gram, CBOW), developed and realized as word2vec program by T. Mikolov. The standard cosine similarity is used as the distance between word-vectors. Several geometric characteristics of the synset words are introduced: the interior of synset, the synset word rank and centrality. These notions are intended to select the most significant synset words, i.e. the words which senses are the nearest to the sense of a synset. Some experiments with proposed notions, based on RusVectores resources, are represented. A brief description of this work can be viewed in slides https://goo.gl/K82Fei