IRLGMay 17, 2019

Cleaned Similarity for Better Memory-Based Recommenders

arXiv:1905.07370v12 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in recommendation systems for users and developers, but it is incremental as it builds on existing similarity estimators.

The paper tackled the problem of noise and eigenvalue spreading in similarity estimators used by memory-based collaborative filtering methods, and proposed a re-scaling and noise cleaning scheme that improved performance over vanilla methods.

Memory-based collaborative filtering methods like user or item k-nearest neighbors (kNN) are a simple yet effective solution to the recommendation problem. The backbone of these methods is the estimation of the empirical similarity between users/items. In this paper, we analyze the spectral properties of the Pearson and the cosine similarity estimators, and we use tools from random matrix theory to argue that they suffer from noise and eigenvalues spreading. We argue that, unlike the Pearson correlation, the cosine similarity naturally possesses the desirable property of eigenvalue shrinkage for large eigenvalues. However, due to its zero-mean assumption, it overestimates the largest eigenvalues. We quantify this overestimation and present a simple re-scaling and noise cleaning scheme. This results in better performance of the memory-based methods compared to their vanilla counterparts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes