AIDec 25, 2025

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Hong Xie, Haoran Gu, Yanying Huang, Tao Tan, Defu Lian

arXiv:2512.21626v13.3h-index: 8

Originality Incremental advance

AI Analysis

This work addresses resource allocation challenges in applications like LLMs and edge intelligence, but it is incremental as it extends existing bandit models with a prioritized sharing mechanism.

The paper tackles the problem of resource allocation in multiple-play stochastic bandits with prioritized capacity sharing, proving instance-independent and instance-dependent regret lower bounds and designing an algorithm with matching upper bounds up to logarithmic factors.

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of $M$ arms and $K$ plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds of $Ω( α_1 σ\sqrt{KM T} )$ and $Ω(α_1 σ^2 \frac{M}Δ \ln T)$ are proved, where $α_1$ is the largest priority weight and $σ$ characterizes the reward tail. When model parameters are given, we design an algorithm named \texttt{MSB-PRS-OffOpt} to locate the optimal play allocation policy with a computational complexity of $O(MK^3)$. Utilizing \texttt{MSB-PRS-OffOpt} as a subroutine, an approximate upper confidence bound (UCB) based algorithm is designed, which has instance independent and instance dependent regret upper bounds matching the corresponding lower bound up to factors of $ \sqrt{K \ln KT }$ and $α_1 K^2$ respectively. To this end, we address nontrivial technical challenges arising from optimizing and learning under a special nonlinear combinatorial utility function induced by the prioritized resource sharing mechanism.

View on arXiv PDF

Similar