Better Best of Both Worlds Bounds for Bandits with Switching Costs
This addresses the challenge of balancing adversarial and stochastic regret bounds in bandit problems with switching costs, which is incremental but improves specific theoretical guarantees.
The paper tackles the problem of designing best-of-both-worlds algorithms for bandits with switching costs, achieving a minimax optimal regret bound of O(T^{2/3}) in the adversarial setting and an improved bound of O(min{log(T)/Δ^2, T^{2/3}}) in the stochastically-constrained regime, with a lower bound showing Ω̃(min{1/Δ^2, T^{2/3}}) regret is unavoidable.
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/Δ^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $Δ$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of $\mathcal{O}(T^{1/3}/Δ)$. We accompany our results with a lower bound showing that, in general, $\tildeΩ(\min\{1/Δ^2,T^{2/3}\})$ regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case regret.