HCOct 28, 2021

IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons

arXiv:2110.14990v213 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for evaluating AI models in tasks like information retrieval and recommender systems, addressing a bottleneck in gathering human feedback, though it is incremental as it builds on existing datasets.

The authors tackled the lack of large-scale public datasets for subjective pairwise comparisons by introducing IMDB-WIKI-SbS, a dataset with 250,249 annotated image pairs from 9,150 images, built using crowdsourcing and balanced for age and gender.

Today, comprehensive evaluation of large-scale machine learning models is possible thanks to the open datasets produced using crowdsourcing, such as SQuAD, MS COCO, ImageNet, SuperGLUE, etc. These datasets capture objective responses, assuming the single correct answer, which does not allow to capture the subjective human perception. In turn, pairwise comparison tasks, in which one has to choose between only two options, allow taking peoples' preferences into account for very challenging artificial intelligence tasks, such as information retrieval and recommender system evaluation. Unfortunately, the available datasets are either small or proprietary, slowing down progress in gathering better feedback from human users. In this paper, we present IMDB-WIKI-SbS, a new large-scale dataset for evaluating pairwise comparisons. It contains 9,150 images appearing in 250,249 pairs annotated on a crowdsourcing platform. Our dataset has balanced distributions of age and gender using the well-known IMDB-WIKI dataset as ground truth. We describe how our dataset is built and then compare several baseline methods, indicating its suitability for model evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes