IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons
This provides a resource for evaluating AI models in tasks like information retrieval and recommender systems, addressing a bottleneck in gathering human feedback, though it is incremental as it builds on existing datasets.
The authors tackled the lack of large-scale public datasets for subjective pairwise comparisons by introducing IMDB-WIKI-SbS, a dataset with 250,249 annotated image pairs from 9,150 images, built using crowdsourcing and balanced for age and gender.
Today, comprehensive evaluation of large-scale machine learning models is possible thanks to the open datasets produced using crowdsourcing, such as SQuAD, MS COCO, ImageNet, SuperGLUE, etc. These datasets capture objective responses, assuming the single correct answer, which does not allow to capture the subjective human perception. In turn, pairwise comparison tasks, in which one has to choose between only two options, allow taking peoples' preferences into account for very challenging artificial intelligence tasks, such as information retrieval and recommender system evaluation. Unfortunately, the available datasets are either small or proprietary, slowing down progress in gathering better feedback from human users. In this paper, we present IMDB-WIKI-SbS, a new large-scale dataset for evaluating pairwise comparisons. It contains 9,150 images appearing in 250,249 pairs annotated on a crowdsourcing platform. Our dataset has balanced distributions of age and gender using the well-known IMDB-WIKI dataset as ground truth. We describe how our dataset is built and then compare several baseline methods, indicating its suitability for model evaluation.