Online Newton Method for Bandit Convex Optimisation
This addresses the challenge of efficient optimization with limited feedback for researchers in online learning and optimization, representing an incremental improvement with specific computational gains.
The paper tackles the problem of bandit convex optimization by introducing a computationally efficient algorithm, achieving a regret bound of at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ in the adversarial setting and $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ in the stochastic setting, where $d$ is dimension and $n$ is time horizon.
We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.