LG MLJun 28, 2018

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

arXiv:1806.10745v225 citations

AI Analysis

This work provides incremental improvements in regret bounds and algorithm efficiency for contextual bandit problems, relevant to online learning and decision-making systems.

The paper tackles contextual bandit learning by using surrogate losses to derive new regret bounds and efficient algorithms, resulting in margin-based regret bounds and a mistake bound of √(dT) for d-dimensional regressors.

We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive new margin-based regret bounds in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield classical regret bounds.

View on arXiv PDF

Similar