LGAIMLApr 16, 2018

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

arXiv:1804.05929v17 citations
Originality Incremental advance
AI Analysis

This addresses a practical problem for practitioners in sequential decision-making by offering a tunable trade-off between optimality and computational efficiency, though it is incremental in improving existing bandit methods.

The paper tackles the trade-off between optimality and computational complexity in stochastic multi-armed bandit algorithms by proposing UCBoost, a boosting approach that achieves near-optimal regret with low complexity, such as reducing computational cost to 1% of kl-UCB while matching its regret performance.

In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We propose a boosting approach to Upper Confidence Bound based algorithms for stochastic bandits, that we call UCBoost. Specifically, we propose two types of UCBoost algorithms. We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm. We propose an approximation-based UCBoost algorithm, UCBoost($ε$), that enjoys a regret guarantee $ε$-close to that of kl-UCB as well as $O(\log(1/ε))$ complexity for each arm per round. Hence, our algorithms provide practitioners a practical way to trade optimality with computational complexity. Finally, we present numerical results which show that UCBoost($ε$) can achieve the same regret performance as the standard kl-UCB while incurring only $1\%$ of the computational cost of kl-UCB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes