STLGFeb 23, 2016

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

arXiv:1602.07182v3232 citations
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights into regret dynamics for researchers in bandit problems, but it is incremental as it refines existing bounds with simpler proofs.

The paper revisits lower bounds on regret in multi-armed bandit problems, showing that regret grows almost linearly in an initial phase and logarithmically only in a final phase, with non-asymptotic, distribution-dependent bounds derived using Kullback-Leibler divergences.

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes