LGJan 4, 2021

Be Greedy in Multi-Armed Bandits

Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

arXiv:2101.01086v112.511 citations

Originality Highly original

AI Analysis

This work challenges the conventional wisdom regarding the Greedy algorithm's poor theoretical performance for researchers and practitioners working with multi-armed bandits, suggesting it can be a highly competitive and computationally efficient choice in specific settings.

The paper investigates the performance of the Greedy algorithm in multi-armed bandit problems, particularly when the number of arms is large. It demonstrates that Greedy, especially with arm subsampling, achieves near-optimal worst-case regret bounds in continuous, infinite, and many-armed bandit problems, and shows significant practical improvements over state-of-the-art methods.

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known to sometimes have poor performances, for instance even a linear regret (with respect to the time horizon) in the standard multi-armed bandit problem. On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models. We build on a recent line of work and investigate bandit settings where the number of arms is relatively large and where simple greedy algorithms enjoy highly competitive performance, both in theory and in practice. We first provide a generic worst-case bound on the regret of the Greedy algorithm. When combined with some arms subsampling, we prove that it verifies near-optimal worst-case regret bounds in continuous, infinite and many-armed bandit problems. Moreover, for shorter time spans, the theoretical relative suboptimality of Greedy is even reduced. As a consequence, we subversively claim that for many interesting problems and associated horizons, the best compromise between theoretical guarantees, practical performances and computational burden is definitely to follow the greedy heuristic. We support our claim by many numerical experiments that show significant improvements compared to the state-of-the-art, even for moderately long time horizon.

View on arXiv PDF

Similar