LGOCPRFeb 8, 2024

Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

arXiv:2402.05689v35 citationsh-index: 5Math Oper Res
Originality Incremental advance
AI Analysis

This provides a solution for optimizing resource allocation in multi-armed bandit scenarios, but it is incremental as it builds on existing work by relaxing assumptions like GAP and SA.

The paper tackles the average-reward restless bandit problem by proposing a new class of policies that drive arms toward the optimal distribution, achieving asymptotic optimality with an O(1/√N) optimality gap for N arms under unichain and aperiodicity assumptions.

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property (GAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes