LGAIMLFeb 20, 2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

arXiv:2302.09739v130 citationsh-index: 26
Originality Highly original
AI Analysis

This provides a foundational tool for online learning researchers, enabling efficient algorithm design across domains like bandits and MDPs without incremental tuning.

The paper tackles the problem of designing algorithms that achieve optimal regret in both adversarial and stochastic regimes for online learning, resolving an open question by presenting a general reduction that transforms existing worst-case algorithms into best-of-both-worlds algorithms, achieving O(log(T)) regret in stochastic and O(√T) in adversarial regimes.

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialised potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can simultaneously obtain $O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret in the adversarial regime. In this work, we resolve this question positively and present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirror-descent (OMD) algorithms. We showcase the capability of this reduction by transforming existing algorithms that are only known to achieve worst-case guarantees into new algorithms with best-of-both-worlds guarantees in contextual bandits, graph bandits and tabular Markov decision processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes