Pairwise, Magnitude, or Stars: What's the Best Way for Crowds to Rate?
This work addresses the problem of selecting optimal rating systems for crowdsourcing applications, though it is incremental as it compares existing methods without introducing new ones.
The study compared three content rating techniques—five-star, pairwise comparison, and magnitude estimation—in terms of user effort, prediction accuracy, and rating requirements, based on 39,000 ratings collected from a crowdsourcing platform.
We compare three popular techniques of rating content: the ubiquitous five star rating, the less used pairwise comparison, and the recently introduced (in crowdsourcing) magnitude estimation approach. Each system has specific advantages and disadvantages, in terms of required user effort, achievable user preference prediction accuracy and number of ratings required. We design an experiment where the three techniques are compared in an unbiased way. We collected 39'000 ratings on a popular crowdsourcing platform, allowing us to release a dataset that will be useful for many related studies on user rating techniques.