Whittle index based Q-learning for restless bandits with average reward
This work addresses computational efficiency in restless bandit problems, which is important for applications like resource allocation, but it appears incremental as it builds on existing Q-learning and Whittle index methods.
The paper tackled the problem of multiarmed restless bandits with average reward by introducing a novel reinforcement learning algorithm that combines Q-learning with the Whittle index policy, resulting in major computational gains as shown through rigorous convergence analysis and numerical experiments.
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.