LG SY MLSep 6, 2024

Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes

Vishesh Mittal, Rahul Meshram, Surya Prakash

arXiv:2409.04605v12.6h-index: 2

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in reinforcement learning methods for restless bandits, potentially aiding resource allocation in dynamic systems.

The authors tackled the problem of learning Whittle indices for restless multi-armed bandits using Q-learning and deep Q-networks with constant stepsizes, showing through numerical examples that their algorithms successfully learn the indices.

We study the Whittle index learning algorithm for restless multi-armed bandits. We consider index learning algorithm with Q-learning. We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes. We extend the study of Q-learning to index learning for single-armed restless bandit. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. In Q-learning updates are in asynchronous manner. We study constant stepsizes two timescale stochastic approximation algorithm. We provide analysis of two-timescale stochastic approximation for index learning with constant stepsizes. Further, we present study on index learning with deep Q-network (DQN) learning and linear function approximation with state-aggregation method. We describe the performance of our algorithms using numerical examples. We have shown that index learning with Q learning, DQN and function approximations learns the Whittle index.

View on arXiv PDF

Similar