LG IR MLJun 20, 2012

Collaborative Filtering and the Missing at Random Assumption

Benjamin Marlin, Richard S. Zemel, Sam Roweis, Malcolm Slaney

arXiv:1206.5267v1331 citations

Originality Synthesis-oriented

AI Analysis

This addresses a fundamental issue in collaborative filtering for online services, but it is incremental as it builds on existing assumptions and methods.

The paper tackles the problem of rating prediction in collaborative filtering by challenging the missing at random (MAR) assumption, showing through a user study that user-selected ratings differ from random samples and that modeling the missing data mechanism improves prediction performance on random ratings.

Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.

View on arXiv PDF

Similar