AI GTJul 17, 2017

Online Multi-Armed Bandit

arXiv:1707.04987v11.7

Originality Incremental advance

AI Analysis

This addresses a sequential decision-making problem in online learning with applications in resource allocation, but it is incremental as it modifies a classic bandit setting.

The paper tackles the problem of an online multi-armed bandit variant where bandits are streamed sequentially and cannot be revisited, studying Bernoulli bandits with means drawn from distributions like uniform. It proposes strategies that are shown to be optimal up to a constant factor in expected performance, including cases where the distribution is unknown.

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber($p_i$) for some underlying mean $p_i$) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.

View on arXiv PDF

Similar