MLLGNov 6, 2024

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

arXiv:2411.14446v22 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses a specific variant of bandit problems for sequential decision-making, offering incremental improvements in algorithm design and theoretical analysis.

The paper tackles the problem of regret minimization in stochastic multi-armed bandits with rested arms having monotonically non-decreasing and concave expected rewards, deriving lower bounds and proposing an algorithm (R-ed-UCB) that achieves a regret bound of ̃O(T^{2/3}) under certain conditions.

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave. We study the inherent sample complexity of the regret minimization problem by deriving suitable regret lower bounds. Then, we design an algorithm for the rested case $\textit{R-ed-UCB}$, providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes