Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation
This work addresses a theoretical problem in online optimization for researchers, providing a significant reduction in regret bounds, though it is incremental as it builds directly on prior results.
The paper tackles the problem of zeroth-order adversarial bandit convex optimisation by proving an improved minimax regret bound of O(d^{2.5} √n log(n)), which reduces the previous bound of O(d^{9.5} √n log(n)^{7.5}) by Bubeck et al. (2017).
We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.