STLGMLMay 24, 2016

Refined Lower Bounds for Adversarial Bandits

arXiv:1605.07416v263 citations
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights for researchers in online learning and bandit theory, but it is incremental as it refines existing lower bounds without introducing new methods.

The paper tackles the problem of establishing lower bounds on regret for adversarial bandit algorithms, showing that recent upper bounds for high-probability, best-arm loss, and quadratic variation are nearly tight, and proves impossibility results regarding optimal arms and loss ranges.

We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total lossof the best arm or (c) depend on the quadratic variation of the losses, are close to tight. Besides this we prove two impossibility results. First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case. Second, the regret cannot scale with the effective range of the losses. In contrast, both results are possible in the full-information setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes