LGJan 30, 2025

Bandits with Anytime Knapsacks

arXiv:2501.18560v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses a more complex constraint in bandit problems for researchers in online learning and decision-making, but it is incremental as it adapts existing methods to a new constraint without surpassing prior bounds.

The paper tackles the problem of bandits with anytime knapsacks (BwAK), which introduces an anytime cost constraint instead of a total budget, adding complexity by requiring constraint adherence throughout decision-making. The result is the SUAK algorithm, which achieves a problem-dependent regret upper bound of O(K log T), matching prior work in simpler settings, as verified by simulations.

We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes