LG DS GT MLMar 25, 2018

Stochastic bandits robust to adversarial corruptions

Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme

arXiv:1803.09353v128.8228 citations

Originality Highly original

AI Analysis

This work addresses the challenge of robustness in bandit algorithms for mixed adversarial-stochastic settings, which is incremental as it builds on existing bandit models by incorporating corruption handling.

The paper tackles the problem of designing bandit algorithms robust to adversarial corruptions in stochastic environments, such as click fraud or fake reviews, by introducing a model where rewards are initially stochastic but can be altered by an adversary. It presents an algorithm that achieves near-optimal performance in stochastic settings and degrades linearly with the total corruption C, while being agnostic to C, and provides a lower bound showing this degradation is necessary.

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e.g., click fraud, fake reviews and email spam. The goal of this model is to encourage the design of bandit algorithms that (i) work well in mixed adversarial and stochastic models, and (ii) whose performance deteriorates gracefully as we move from fully stochastic to fully adversarial models. In our model, the rewards for all arms are initially drawn from a distribution and are then altered by an adaptive adversary. We provide a simple algorithm whose performance gracefully degrades with the total corruption the adversary injected in the data, measured by the sum across rounds of the biggest alteration the adversary made in the data in that round; this total corruption is denoted by $C$. Our algorithm provides a guarantee that retains the optimal guarantee (up to a logarithmic term) if the input is stochastic and whose performance degrades linearly to the amount of corruption $C$, while crucially being agnostic to it. We also provide a lower bound showing that this linear degradation is necessary if the algorithm achieves optimal performance in the stochastic setting (the lower bound works even for a known amount of corruption, a special case in which our algorithm achieves optimal performance without the extra logarithm).

View on arXiv PDF

Similar