Generative Interest Estimation for Document Recommendations
This work addresses the challenge of improving recommendation accuracy for users in content-based systems, though it appears incremental as it builds on existing representation learning methods.
The paper tackles the problem of content-based document recommendation by proposing a system that uses learned document representations and models user interest as a Gaussian mixture model in that space, showing that this approach outperforms Latent Semantic Analysis in predictive performance on the Delicious bookmarks dataset.
Learning distributed representations of documents has pushed the state-of-the-art in several natural language processing tasks and was successfully applied to the field of recommender systems recently. In this paper, we propose a novel content-based recommender system based on learned representations and a generative model of user interest. Our method works as follows: First, we learn representations on a corpus of text documents. Then, we capture a user's interest as a generative model in the space of the document representations. In particular, we model the distribution of interest for each user as a Gaussian mixture model (GMM). Recommendations can be obtained directly by sampling from a user's generative model. Using Latent semantic analysis (LSA) as comparison, we compute and explore document representations on the Delicious bookmarks dataset, a standard benchmark for recommender systems. We then perform density estimation in both spaces and show that learned representations outperform LSA in terms of predictive performance.