LG MLFeb 1, 2020

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

arXiv:2002.00287v319.555 citations

Originality Incremental advance

AI Analysis

This addresses robust decision-making under uncertainty in adversarial environments, such as online advertising or recommendation systems, but is incremental as it builds on existing Exp3 methods.

The paper tackles the adversarial linear contextual bandit problem with unrestricted loss changes, developing two algorithms: RealLinExp3 achieves regret of O~(√(KdT)), matching the best known bound, and RobustLinExp3 handles misspecification with regret O~((Kd)^{1/3}T^{2/3}) + ε√d T for bounded nonlinear errors.

We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d.~at random from a known distributions, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of $\widetilde{O}(\sqrt{KdT})$ over $T$ rounds, which matches the best available bound for this problem. Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value by $\varepsilon$. To our knowledge, our performance guarantees constitute the very first results on this problem setting.

View on arXiv PDF

Similar