ST LGFeb 23, 2016

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier, Pierre Ménard, Gilles Stoltz

arXiv:1602.07182v328.4232 citations

Originality Synthesis-oriented

AI Analysis

This work provides theoretical insights into regret dynamics for researchers in bandit problems, but it is incremental as it refines existing bounds with simpler proofs.

The paper revisits lower bounds on regret in multi-armed bandit problems, showing that regret grows almost linearly in an initial phase and logarithmically only in a final phase, with non-asymptotic, distribution-dependent bounds derived using Kullback-Leibler divergences.

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

View on arXiv PDF

Similar