DS LG SPApr 11, 2022

On Top-$k$ Selection from $m$-wise Partial Rankings via Borda Counting

Wenjing Chen, Ruida Zhou, Chao Tian, Cong Shen

arXiv:2204.05742v12.36 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work addresses ranking and selection problems in data analysis, offering improved theoretical guarantees for non-parametric models, but it is incremental as it builds on existing methods with refinements.

The paper tackles the problem of selecting the top-k items from m-wise partial rankings using the Borda counting algorithm, showing that accuracy depends on the score separation Δk, with asymptotic correctness when Δk is above a threshold and constant error probability below it, and provides tighter bounds for pairwise comparisons than prior work.

We analyze the performance of the Borda counting algorithm in a non-parametric model. The algorithm needs to utilize probabilistic rankings of the items within $m$-sized subsets to accurately determine which items are the overall top-$k$ items in a total of $n$ items. The Borda counting algorithm simply counts the cumulative scores for each item from these partial ranking observations. This generalizes a previous work of a similar nature by Shah et al. using probabilistic pairwise comparison data. The performance of the Borda counting algorithm critically depends on the associated score separation $Δ_k$ between the $k$-th item and the $(k+1)$-th item. Specifically, we show that if $Δ_k$ is greater than certain value, then the top-$k$ items selected by the algorithm is asymptotically accurate almost surely; if $Δ_k$ is below certain value, then the result will be inaccurate with a constant probability. In the special case of $m=2$, i.e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds. These results are further extended to the approximate top-$k$ selection setting. Numerical experiments demonstrate the effectiveness and accuracy of the Borda counting algorithm, compared with the spectral MLE-based algorithm, particularly when the data does not necessarily follow an assumed parametric model.

View on arXiv PDF

Similar