CLAug 27, 2016

Testing APSyn against Vector Cosine on Similarity Estimation

arXiv:1608.07738v226 citations
Originality Incremental advance
AI Analysis

This work addresses a specific problem in natural language processing for researchers and practitioners by providing an incremental improvement over existing similarity estimation methods.

The paper tackled the problem of similarity estimation in Distributional Semantic Models by evaluating APSyn, a measure based on intersection of associated contexts, against the widely used Vector Cosine. The results showed that APSyn is highly competitive on popular test sets, addressing some weaknesses of Vector Cosine and performing well on genuine similarity estimation.

In Distributional Semantic Models (DSMs), Vector Cosine is widely used to estimate similarity between word vectors, although this measure was noticed to suffer from several shortcomings. The recent literature has proposed other methods which attempt to mitigate such biases. In this paper, we intend to investigate APSyn, a measure that computes the extent of the intersection between the most associated contexts of two target words, weighting it by context relevance. We evaluated this metric in a similarity estimation task on several popular test sets, and our results show that APSyn is in fact highly competitive, even with respect to the results reported in the literature for word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector Cosine, performing well also on genuine similarity estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes