ML LGJun 25, 2014

When is it Better to Compare than to Score?

Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright

arXiv:1406.6618v125 citations

Originality Incremental advance

AI Analysis

This work addresses a practical problem for researchers and practitioners in fields like crowdsourcing and data collection, offering guidelines for measurement scheme selection, but it is incremental as it builds on existing models like Thurstone and BTL.

The paper tackles the problem of choosing between direct-scoring (cardinal) and comparative (ordinal) measurements for eliciting human judgments, finding that ordinal measurements often have lower per-sample noise and are faster to elicit, but provide less information, with theoretical and empirical results showing ordinal methods yield smaller estimation errors when noise is sufficiently low.

When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements. In this paper we study the relative merits of either choice, providing empirical and theoretical guidelines for the selection of a measurement scheme. We provide empirical evidence based on experiments on Amazon Mechanical Turk that in a variety of tasks, (pairwise-comparative) ordinal measurements have lower per sample noise and are typically faster to elicit than cardinal ones. Ordinal measurements however typically provide less information. We then consider the popular Thurstone and Bradley-Terry-Luce (BTL) models for ordinal measurements and characterize the minimax error rates for estimating the unknown quantity. We compare these minimax error rates to those under cardinal measurement models and quantify for what noise levels ordinal measurements are better. Finally, we revisit the data collected from our experiments and show that fitting these models confirms this prediction: for tasks where the noise in ordinal measurements is sufficiently low, the ordinal approach results in smaller errors in the estimation.

View on arXiv PDF

Similar