LGSYOct 30, 2024

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

arXiv:2410.23029v2h-index: 21
Originality Incremental advance
AI Analysis

This work addresses risk management in resource allocation for applications like machine replacement and patient scheduling, representing an incremental extension of traditional restless multi-arm bandits.

The authors tackled the restless multi-arm bandit problem by incorporating risk-awareness into the objective, establishing indexability conditions and proposing a Whittle index solution, along with a Thompson sampling approach for unknown transition probabilities that achieves bounded regret scaling sublinearly with episodes and quadratically with arms.

In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes