IRSep 2, 2019

All You Need is Ratings: A Clustering Approach to Synthetic Rating Datasets Generation

arXiv:1909.00687v15 citations
Originality Incremental advance
AI Analysis

This addresses the need for reliable synthetic data to facilitate offline evaluations in recommender systems, though it is incremental as it builds on existing generative approaches.

The paper tackles the problem of limited public rating datasets for recommender system evaluation by proposing a method to generate synthetic datasets that mimic real ones, and empirically validates that the synthetic datasets produce comparable evaluation results to real datasets.

The public availability of collections containing user preferences is of vital importance for performing offline evaluations in the field of recommender systems. However, the number of rating datasets is limited because of the costs required for their creation and the fear of violating the privacy of the users by sharing them. For this reason, numerous research attempts investigated the creation of synthetic collections of ratings using generative approaches. Nevertheless, these datasets are usually not reliable enough for conducting an evaluation campaign. In this paper, we propose a method for creating synthetic datasets with a configurable number of users that mimic the characteristics of already existing ones. We empirically validated the proposed approach by exploiting the synthetic datasets for evaluating different recommenders and by comparing the results with the ones obtained using real datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes