Minimax Regret for Bandit Convex Optimisation of Ridge Functions
This provides a theoretical bound for a specific bandit optimization problem, which is incremental as it builds on existing work by restricting the adversary to ridge functions.
The paper tackles the problem of adversarial bandit convex optimization with ridge functions, where the adversary plays functions of the form f_t(x) = g_t(⟨x, θ⟩) with unknown θ, and proves that the minimax regret is at most O(d √n log(n diam(K))).
We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, θ\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $θ\in \mathbb R^d$ that is homogeneous over time. We provide a short information-theoretic proof that the minimax regret is at most $O(d \sqrt{n} \log(n \operatorname{diam}(\mathcal K)))$ where $n$ is the number of interactions, $d$ the dimension and $\operatorname{diam}(\mathcal K)$ is the diameter of the constraint set.