Mirror Descent and the Information Ratio
This work addresses the problem of improving regret bounds in adversarial bandits for researchers in online learning and optimization, offering a novel theoretical connection and efficient algorithm.
The paper connects mirror descent stability to the information ratio, showing that mirror descent with appropriate loss estimators and exploratory distributions achieves the same adversarial regret bound as Bayesian regret bounds for information-directed sampling, and provides an efficient algorithm for adversarial bandits matching the best known information-theoretic upper bound.
We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound.