LGMLMar 19, 2019

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

arXiv:1903.07890v310 citations
Originality Incremental advance
AI Analysis

This work addresses incremental theoretical advancements for researchers in online learning and bandit algorithms, focusing on regret bounds and variance analysis.

The paper tackles theoretical improvements in adversarial bandits by proving a first-order bound for a modified INF strategy, analyzing variance in follow-the-regularised-leader algorithms, and deriving gap-dependent bounds that generalize and improve prior results.

We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically Ω(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes