LG AI DS MLJul 27, 2025

Online Learning with Probing for Sequential User-Centric Selection

Tianyi Xu, Yiting Chen, Henger Li, Zheyong Bian, Emiliano Dall'Anese, Zizhan Zheng

arXiv:2507.20112v21 citationsh-index: 12ECAI

Originality Incremental advance

AI Analysis

This work addresses resource allocation challenges in applications like ridesharing and content recommendation, where both resources and payoffs are initially unknown, but it is incremental as it builds on existing bandit and probing frameworks.

The authors tackled the problem of sequential decision-making with costly information acquisition, formalized as the probing-augmented user-centric selection (PUCS) framework, and developed algorithms for offline and online settings, achieving a constant-factor approximation guarantee of ζ = (e-1)/(2e-1) and a regret bound of O(√T + ln² T) with a matching lower bound of Ω(√T).

We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns $K$ plays to $M$ arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee $ζ= (e-1)/(2e-1)$. For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound $\mathcal{O}(\sqrt{T} + \ln^{2} T)$. We also prove a lower bound $Ω(\sqrt{T})$, showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.

View on arXiv PDF

Similar