MLLGSTAug 27, 2024

Optimal level set estimation for non-parametric tournament and crowdsourcing problems

arXiv:2408.15356v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses efficient worker allocation in crowdsourcing by providing an optimal algorithm for a specific matrix model, though it is incremental as it builds on existing permutation models.

The paper tackles the problem of estimating level sets in a bi-isotonic matrix from partial observations, which is relevant for crowdsourcing and tournament ranking, and presents a polynomial-time algorithm that achieves minimax optimal classification error.

Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that both the experts and the questions can be ordered, namely that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. When $n=d$, this also encompasses the strongly stochastic transitive (SST) model from the tournament literature. Here, we focus on the relevant problem of deciphering small entries of $M$ from large entries of $M$, which is key in crowdsourcing for efficient allocation of workers to questions. More precisely, we aim at recovering a (or several) level set $p$ of the matrix up to a precision $h$, namely recovering resp. the sets of positions $(i,j)$ in $M$ such that $M_{ij}>p+h$ and $M_{i,j}<p-h$. We consider, as a loss measure, the number of misclassified entries. As our main result, we construct an efficient polynomial-time algorithm that turns out to be minimax optimal for this classification problem. This heavily contrasts with existing literature in the SST model where, for the stronger reconstruction loss, statistical-computational gaps have been conjectured. More generally, this shades light on the nature of statistical-computational gaps for permutations models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes