MLLGJul 7, 2020

Stochastic Linear Bandits Robust to Adversarial Attacks

arXiv:2007.03285v284 citations
AI Analysis

This addresses robustness in bandit algorithms for scenarios like online advertising where rewards may be corrupted, offering incremental improvements over existing methods.

The paper tackles the problem of stochastic linear bandits under adversarial attacks with a corruption budget C, proposing Robust Phased Elimination algorithms that achieve near-optimal regret with additive terms linear or quadratic in C, and shows a greedy algorithm is robust in contextual settings.

We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes across the time horizon). We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants are shown to attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively having a linear and quadratic dependency on $C$ in general. We present algorithm independent lower bounds showing that these additive terms are near-optimal. In addition, in a contextual setting, we revisit a setup of diverse contexts, and show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes