LGJan 4, 2021

Be Greedy in Multi-Armed Bandits

arXiv:2101.01086v111 citations
Originality Highly original
AI Analysis

This work challenges the conventional wisdom regarding the Greedy algorithm's poor theoretical performance for researchers and practitioners working with multi-armed bandits, suggesting it can be a highly competitive and computationally efficient choice in specific settings.

The paper investigates the performance of the Greedy algorithm in multi-armed bandit problems, particularly when the number of arms is large. It demonstrates that Greedy, especially with arm subsampling, achieves near-optimal worst-case regret bounds in continuous, infinite, and many-armed bandit problems, and shows significant practical improvements over state-of-the-art methods.

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known to sometimes have poor performances, for instance even a linear regret (with respect to the time horizon) in the standard multi-armed bandit problem. On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models. We build on a recent line of work and investigate bandit settings where the number of arms is relatively large and where simple greedy algorithms enjoy highly competitive performance, both in theory and in practice. We first provide a generic worst-case bound on the regret of the Greedy algorithm. When combined with some arms subsampling, we prove that it verifies near-optimal worst-case regret bounds in continuous, infinite and many-armed bandit problems. Moreover, for shorter time spans, the theoretical relative suboptimality of Greedy is even reduced. As a consequence, we subversively claim that for many interesting problems and associated horizons, the best compromise between theoretical guarantees, practical performances and computational burden is definitely to follow the greedy heuristic. We support our claim by many numerical experiments that show significant improvements compared to the state-of-the-art, even for moderately long time horizon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes