Multi-scale exploration of convex functions and bandit convex optimization
This solves a long-standing open problem in online learning, providing a foundational advance for bandit convex optimization with broad implications in machine learning.
The paper tackles the problem of adversarial bandit convex optimization by constructing a multi-scale exploration map from convex functions to distributions, resulting in a minimax regret bound of $ ilde{O}(\mathrm{poly}(n) \sqrt{T})$, solving a decade-old open problem.
We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. We use this map to solve a decade-old open problem in adversarial bandit convex optimization by showing that the minimax regret for this problem is $\tilde{O}(\mathrm{poly}(n) \sqrt{T})$, where $n$ is the dimension and $T$ the number of rounds. This bound is obtained by studying the dual Bayesian maximin regret via the information ratio analysis of Russo and Van Roy, and then using the multi-scale exploration to solve the Bayesian problem.