LGMay 27, 2016

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

arXiv:1605.08722v1117 citations
Originality Highly original
AI Analysis

This addresses the challenge of robust bandit algorithms for practitioners needing adaptability across different environments, though it is incremental by building on prior work.

The paper tackles the problem of designing a single algorithm that performs well in both stochastic and adversarial bandit settings, achieving nearly optimal pseudo-regret bounds: O(K√(n log n)) for adversarial and O(∑(log n)/Δ_i) for stochastic bandits.

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $O(K\sqrt{n \log n})$ and against stochastic bandits the pseudo-regret is $O(\sum_i (\log n)/Δ_i)$. We also show that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected regret against adaptive adversarial bandits. This complements previous results of Bubeck and Slivkins (2012) that show $\tilde{O}(\sqrt{n})$ expected adversarial regret with $O((\log n)^2)$ stochastic pseudo-regret.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes