DSLGSPApr 11, 2022

On Top-$k$ Selection from $m$-wise Partial Rankings via Borda Counting

arXiv:2204.05742v16 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses ranking and selection problems in data analysis, offering improved theoretical guarantees for non-parametric models, but it is incremental as it builds on existing methods with refinements.

The paper tackles the problem of selecting the top-k items from m-wise partial rankings using the Borda counting algorithm, showing that accuracy depends on the score separation Δk, with asymptotic correctness when Δk is above a threshold and constant error probability below it, and provides tighter bounds for pairwise comparisons than prior work.

We analyze the performance of the Borda counting algorithm in a non-parametric model. The algorithm needs to utilize probabilistic rankings of the items within $m$-sized subsets to accurately determine which items are the overall top-$k$ items in a total of $n$ items. The Borda counting algorithm simply counts the cumulative scores for each item from these partial ranking observations. This generalizes a previous work of a similar nature by Shah et al. using probabilistic pairwise comparison data. The performance of the Borda counting algorithm critically depends on the associated score separation $Δ_k$ between the $k$-th item and the $(k+1)$-th item. Specifically, we show that if $Δ_k$ is greater than certain value, then the top-$k$ items selected by the algorithm is asymptotically accurate almost surely; if $Δ_k$ is below certain value, then the result will be inaccurate with a constant probability. In the special case of $m=2$, i.e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds. These results are further extended to the approximate top-$k$ selection setting. Numerical experiments demonstrate the effectiveness and accuracy of the Borda counting algorithm, compared with the spectral MLE-based algorithm, particularly when the data does not necessarily follow an assumed parametric model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes