LG OC MLApr 29, 2020

Whittle index based Q-learning for restless bandits with average reward

Konstantin E. Avrachenkov, Vivek S. Borkar

arXiv:2004.14427v319.185 citations

Originality Incremental advance

AI Analysis

This work addresses computational efficiency in restless bandit problems, which is important for applications like resource allocation, but it appears incremental as it builds on existing Q-learning and Whittle index methods.

The paper tackled the problem of multiarmed restless bandits with average reward by introducing a novel reinforcement learning algorithm that combines Q-learning with the Whittle index policy, resulting in major computational gains as shown through rigorous convergence analysis and numerical experiments.

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

View on arXiv PDF

Similar