LG OC PRFeb 8, 2024

Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang

arXiv:2402.05689v36.45 citationsh-index: 5Has CodeMath Oper Res

Originality Incremental advance

AI Analysis

This provides a solution for optimizing resource allocation in multi-armed bandit scenarios, but it is incremental as it builds on existing work by relaxing assumptions like GAP and SA.

The paper tackles the average-reward restless bandit problem by proposing a new class of policies that drive arms toward the optimal distribution, achieving asymptotic optimality with an O(1/√N) optimality gap for N arms under unichain and aperiodicity assumptions.

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property (GAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).

View on arXiv PDF Code

Similar