General Formulation and PCL-Analysis for Restless Bandits with Limited Observability
This work addresses a general observation model for restless bandits, which is incremental as it extends existing methods to handle infinite-state spaces with partial observability.
The paper tackles the problem of restless multi-armed bandits with limited and noisy observations by formulating it as an infinite-state model and applying a partial conservation law analysis to derive indexability and Whittle index, proposing an approximation that enables the use of existing algorithms and showing excellent performance in numerical experiments.
In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player is based on the past observation history that is limited (partial) and error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of the observation process, we formulate the problem as a restless bandit with an infinite high-dimensional belief state space. We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Niño-Mora (2001) for finite-state problems can be applied. Numerical experiments show that our algorithm has excellent performance.