LG MLMar 3, 2020

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

arXiv:2003.01328v54.22 citations

Originality Incremental advance

AI Analysis

This work addresses faster learning in bandit problems with known finite parameter sets, offering incremental improvements over standard methods like UCB.

The paper tackles the problem of finitely parameterized multi-armed bandits by proposing the FP-UCB algorithm, which achieves bounded regret under a structural condition on the parameter set and logarithmic regret with a smaller constant otherwise, as validated through numerical simulations.

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, the FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.

View on arXiv PDF

Similar