OCLGMLMay 31, 2020

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

arXiv:2006.00475v348 citations
AI Analysis

This work addresses a theoretical problem in online optimization for researchers, providing a significant reduction in regret bounds, though it is incremental as it builds directly on prior results.

The paper tackles the problem of zeroth-order adversarial bandit convex optimisation by proving an improved minimax regret bound of O(d^{2.5} √n log(n)), which reduces the previous bound of O(d^{9.5} √n log(n)^{7.5}) by Bubeck et al. (2017).

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes