OC LG MLMay 31, 2020

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

arXiv:2006.00475v321.048 citations

Originality Incremental advance

AI Analysis

This work addresses a theoretical problem in online optimization for researchers, providing a significant reduction in regret bounds, though it is incremental as it builds directly on prior results.

The paper tackles the problem of zeroth-order adversarial bandit convex optimisation by proving an improved minimax regret bound of O(d^{2.5} √n log(n)), which reduces the previous bound of O(d^{9.5} √n log(n)^{7.5}) by Bubeck et al. (2017).

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

View on arXiv PDF

Similar