LGAIOct 10, 2025

Bandits with Single-Peaked Preferences and Limited Resources

arXiv:2510.09425v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses computational challenges in matching problems for resource allocation, but is incremental as it builds on known structural assumptions from social choice theory.

The paper tackles the online stochastic matching problem with budget constraints by assuming single-peaked user preferences, developing efficient offline and online algorithms that achieve regret bounds of ˜O(UKT^{2/3}) and ˜O(U√TK) respectively.

We study an online stochastic matching problem in which an algorithm sequentially matches $U$ users to $K$ arms, aiming to maximize cumulative reward over $T$ rounds under budget constraints. Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible. To overcome this barrier, we focus on \emph{single-peaked preferences} -- a well-established structure in social choice theory, where users' preferences are unimodal with respect to a common order over arms. We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $\tilde O(UKT^{2/3})$. Our approach relies on a novel PQ tree-based order approximation method. If the single-peaked structure is known, we develop an efficient UCB-like algorithm that achieves a regret bound of $\tilde O(U\sqrt{TK})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes