CLAug 27, 2016

Testing APSyn against Vector Cosine on Similarity Estimation

Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, Chu-Ren Huang, Philippe Blache

arXiv:1608.07738v212.226 citations

Originality Incremental advance

AI Analysis

This work addresses a specific problem in natural language processing for researchers and practitioners by providing an incremental improvement over existing similarity estimation methods.

The paper tackled the problem of similarity estimation in Distributional Semantic Models by evaluating APSyn, a measure based on intersection of associated contexts, against the widely used Vector Cosine. The results showed that APSyn is highly competitive on popular test sets, addressing some weaknesses of Vector Cosine and performing well on genuine similarity estimation.

In Distributional Semantic Models (DSMs), Vector Cosine is widely used to estimate similarity between word vectors, although this measure was noticed to suffer from several shortcomings. The recent literature has proposed other methods which attempt to mitigate such biases. In this paper, we intend to investigate APSyn, a measure that computes the extent of the intersection between the most associated contexts of two target words, weighting it by context relevance. We evaluated this metric in a similarity estimation task on several popular test sets, and our results show that APSyn is in fact highly competitive, even with respect to the results reported in the literature for word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector Cosine, performing well also on genuine similarity estimation.

View on arXiv PDF

Similar