Adversarial bandit optimization for approximately linear functions
This addresses adversarial bandit optimization for approximately linear functions, which is incremental as it builds on existing linear bandit methods.
The paper tackles the problem of bandit optimization for nonconvex and non-smooth functions, where the loss function includes a linear component plus an adversarial perturbation, providing both expected and high-probability regret bounds and an improved bound for the linear case.
We consider a bandit optimization problem for nonconvex and non-smooth functions, where in each trial the loss function is the sum of a linear function and a small but arbitrary perturbation chosen after observing the player's choice. We give both expected and high probability regret bounds for the problem. Our result also implies an improved high-probability regret bound for the bandit linear optimization, a special case with no perturbation. We also give a lower bound on the expected regret.