MLAILGDec 23, 2015

The Max $K$-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

arXiv:1512.07650v12 citations
Originality Incremental advance
AI Analysis

This work addresses sample efficiency in multi-armed bandit problems for decision-making agents, presenting incremental improvements in algorithm design and analysis.

The paper tackles the Max K-Armed Bandit problem by establishing a PAC lower bound on sample complexity for finding a near-optimal item and proposes an algorithm that achieves this bound up to logarithmic factors, with comparisons showing random arm selection can outperform when maximal rewards are similar.

We consider the Max $K$-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution. At each time step the agent chooses an arm, and observes the reward of the obtained sample. Each sample is considered here as a separate item with the reward designating its value, and the goal is to find an item with the highest possible value. Our basic assumption is a known lower bound on the {\em tail function} of the reward distributions. Under the PAC framework, we provide a lower bound on the sample complexity of any $(ε,δ)$-correct algorithm, and propose an algorithm that attains this bound up to logarithmic factors. We analyze the robustness of the proposed algorithm and in addition, we compare the performance of this algorithm to the variant in which the arms are not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when the maximal rewards of the arms happen to be similar, the latter approach may provide better performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes