LGGTMLMar 4, 2020

Bandits with adversarial scaling

arXiv:2003.02287v214 citations
AI Analysis

This addresses robustness in bandit algorithms for applications like display advertising, but it is incremental as it builds on existing adaptive methods.

The paper tackles the problem of multi-armed bandits with adversarial scaling, where rewards combine stochastic and adversarial components, by showing that most algorithms fail in certain settings, but two adaptive algorithms (action elimination and mirror descent) achieve robustness.

We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes