Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces
This work addresses the problem of adversarial online learning for researchers, providing a Bayesian approach that extends to infinite action spaces, though it is incremental as it builds on existing Thompson sampling and Bayesian optimization methods.
The paper tackles online learning under full feedback by developing a Thompson sampling variant where the prior is over the adversary's future actions, showing that regret decomposes into expected regret plus an excess regret term. In finite-expert settings, it recovers optimal rates, and for infinite action spaces, it achieves a rate of O(β√(Td log(1+√d λ/β))) against bounded Lipschitz adversaries.
We develop a form Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling over the $d$-dimensional unit cube, using a certain Gaussian process prior widely-used in the Bayesian optimization literature, has a $\mathcal{O}\Big(β\sqrt{Td\log(1+\sqrt{d}\fracλβ)}\Big)$ rate against a $β$-bounded $λ$-Lipschitz adversary.