LGJun 1, 2016

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

arXiv:1606.00313v144 citations
Originality Highly original
AI Analysis

This work provides a significant improvement in regret bounds for adversarial contextual bandits, addressing a key challenge in online learning with applications in recommendation systems and decision-making under uncertainty.

The paper tackles the adversarial contextual bandit problem by proposing an oracle-based algorithm that achieves a regret bound of O((KT)^{2/3}(log N)^{1/3}), breaking the previous O(T^{3/4}) barrier, which was a major open problem.

We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is the number of actions, $T$ is the number of iterations and $N$ is the number of baseline policies. Our result is the first to break the $O(T^{\frac{3}{4}})$ barrier that is achieved by recently introduced algorithms. Breaking this barrier was left as a major open problem. Our analysis is based on the recent relaxation based approach of (Rakhlin and Sridharan, 2016).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes