On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
This work addresses incremental theoretical advancements for researchers in online learning and bandit algorithms, focusing on regret bounds and variance analysis.
The paper tackles theoretical improvements in adversarial bandits by proving a first-order bound for a modified INF strategy, analyzing variance in follow-the-regularised-leader algorithms, and deriving gap-dependent bounds that generalize and improve prior results.
We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically Ω(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).