LG MLMay 28, 2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

arXiv:1905.11817v114.837 citationsh-index: 27

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding and algorithm design for online learning problems, such as bandits and optimization, by bridging information-theoretic and convex optimization approaches, with incremental improvements in regret bounds.

The paper formalizes the connection between information-theoretic analysis and mirror descent techniques in online learning, showing that existing methods can derive similar bounds, and introduces an efficient algorithm for k-armed adversarial bandits that matches the best information-theoretic upper bound while improving regret guarantees for online linear optimization on ℓ_p-balls and bandits with graph feedback.

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for $k$-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

View on arXiv PDF

Similar