OC LG MLJun 10, 2024

Online Newton Method for Bandit Convex Optimisation

Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

arXiv:2406.06506v114.49 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient optimization with limited feedback for researchers in online learning and optimization, representing an incremental improvement with specific computational gains.

The paper tackles the problem of bandit convex optimization by introducing a computationally efficient algorithm, achieving a regret bound of at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ in the adversarial setting and $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ in the stochastic setting, where $d$ is dimension and $n$ is time horizon.

We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where $M \in [d^{-1/2}, d^{-1 / 4}]$ is a constant that depends on the geometry of the constraint set and the desired computational properties.

View on arXiv PDF

Similar