LGMLFeb 19, 2021

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

arXiv:2102.09864v128 citations
Originality Incremental advance
AI Analysis

This work addresses a key challenge in online decision-making for scenarios like resource allocation, though it appears incremental as it adapts an existing method to incorporate switching costs.

The paper tackles the problem of multi-armed bandits with switching costs, proposing an algorithm that achieves minimax optimal regret bounds in both adversarial and stochastic regimes, with experimental results showing competitiveness against baselines.

We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price $λ$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of $O\big((λK)^{1/3}T^{2/3} + \sqrt{KT}\big)$, where $T$ is the time horizon and $K$ is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of $O\left(\big((λK)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} Δ_i^{-1}\right)$, where $Δ_i$ are the suboptimality gaps and $i^*$ is a unique optimal arm. In the special case of $λ= 0$ (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes