Bandits with Anytime Knapsacks
This work addresses a more complex constraint in bandit problems for researchers in online learning and decision-making, but it is incremental as it adapts existing methods to a new constraint without surpassing prior bounds.
The paper tackles the problem of bandits with anytime knapsacks (BwAK), which introduces an anytime cost constraint instead of a total budget, adding complexity by requiring constraint adherence throughout decision-making. The result is the SUAK algorithm, which achieves a problem-dependent regret upper bound of O(K log T), matching prior work in simpler settings, as verified by simulations.
We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.