PRLGOCMLAug 20, 2018

A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions

arXiv:1808.06314v22 citations
AI Analysis

It provides a general framework for multi-armed bandit problems, extending to various settings like continuous time and semi-Markovian cases, which is incremental but broadens applicability.

The paper tackles the problem of multi-armed bandit processes by introducing restrictions on arm switches in continuous time, constructing a Gittins index process and proving its optimality, with the new theory covering classical models and simplifying proofs.

This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms evolving in continuous time. The Gittins index process is constructed for any single arm subject to the restrictions on switches and then the optimality of the corresponding Gittins index rule is established. The Gittins indices defined in this paper are consistent with the ones for MAB processes in continuous time, integer time, semi-Markovian setting as well as general discrete time setting, so that the new theory covers the classical models as special cases and also applies to many other situations that have not yet been touched in the literature. While the proof of the optimality of Gittins index policies benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplify the proof.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes