ML LGOct 11, 2018

Fighting Contextual Bandits with Stochastic Smoothing

arXiv:1810.05188v21.0

Originality Incremental advance

AI Analysis

This work provides a theoretical framework for improving regret bounds in adversarial contextual bandits, but it is incremental as it matches existing results like EXP4.

The authors tackled adversarial contextual bandit problems by introducing a stochastic smoothing perspective and a general algorithm template, achieving an O(√T) zero-order bound and an O(L_T^{*2/3}) first-order bound for regret.

We introduce a new stochastic smoothing perspective to study adversarial contextual bandit problems. We propose a general algorithm template that represents random perturbation based algorithms and identify several perturbation distributions that lead to strong regret bounds. Using the idea of smoothness, we provide an $O(\sqrt{T})$ zero-order bound for the vanilla algorithm and an $O(L^{*2/3}_{T})$ first-order bound for the clipped version. These bounds hold when the algorithms use with a variety of distributions that have a bounded hazard rate. Our algorithm template includes EXP4 as a special case corresponding to the Gumbel perturbation. Our regret bounds match existing results for EXP4 without relying on the specific properties of the algorithm.

View on arXiv PDF

Similar