CL DSSep 1, 2020

Document Similarity from Vector Space Densities

arXiv:2009.00672v14 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient document similarity estimation, but it is incremental as it builds on existing word embedding and kernel regression techniques.

The authors tackled the problem of estimating similarities between text documents by proposing a computationally light method called density similarity (DS), which achieved virtually the same accuracy as a state-of-the-art method with a substantial gain in speed.

We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel regression, and takes into account semantic relations among words. We find that the accuracy of this method is virtually the same as that of a state-of-the-art method, while the gain in speed is very substantial. Additionally, we introduce generalized versions of the top-k accuracy metric and of the Jaccard metric of agreement between similarity models.

View on arXiv PDF

Similar