ML LG ST MEJun 2, 2025

Assumption-free stability for ranking problems

Ruiting Liang, Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

arXiv:2506.02257v14.52 citationsh-index: 4

Originality Highly original

AI Analysis

This work addresses the problem of ranking instability for practitioners in fields like recommendation systems, offering a solution that is not incremental but introduces a new framework.

The authors tackled the instability of ranking problems under noisy data by developing a new algorithmic stability framework and proposing two novel ranking operators, the inflated top-k and inflated full ranking, which provide guaranteed stability without assumptions on data distributions or dependence on the number of candidates.

In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that estimating a ranking from noisy data can exhibit high sensitivity to small perturbations. Concretely, if we use data to provide a score for each item (say, by aggregating preference data over a sample of users), then for two items with similar scores, small fluctuations in the data can alter the relative ranking of those items. Many existing theoretical results for ranking problems assume a separation condition to avoid this challenge, but real-world data often contains items whose scores are approximately tied, limiting the applicability of existing theory. To address this gap, we develop a new algorithmic stability framework for ranking problems, and propose two novel ranking operators for achieving stable ranking: the \emph{inflated top-$k$} for the top-$k$ selection problem and the \emph{inflated full ranking} for ranking the full list. To enable stability, each method allows for expressing some uncertainty in the output. For both of these two problems, our proposed methods provide guaranteed stability, with no assumptions on data distributions and no dependence on the total number of candidates to be ranked. Experiments on real-world data confirm that the proposed methods offer stability without compromising the informativeness of the output.

View on arXiv PDF

Similar