LGMLJun 22, 2025

Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels

arXiv:2506.18186v21 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses a key limitation in applying Whittle index policies to real-world problems with time-varying environments, though it is an incremental improvement over existing methods.

The paper tackles optimal resource allocation in restless multi-armed bandits with unknown, non-stationary transition dynamics by proposing a Sliding-Window Online Whittle (SW-Whittle) policy, achieving a dynamic regret of $ ilde O(T^{2/3} ilde V^{1/3}+T^{4/5})$ and outperforming baselines in experiments.

We study optimal resource allocation in restless multi-armed bandits (RMABs) under unknown and non-stationary dynamics. Solving RMABs optimally is PSPACE-hard even with full knowledge of model parameters, and while the Whittle index policy offers asymptotic optimality with low computational cost, it requires access to stationary transition kernels - an unrealistic assumption in many applications. To address this challenge, we propose a Sliding-Window Online Whittle (SW-Whittle) policy that remains computationally efficient while adapting to time-varying kernels. Our algorithm achieves a dynamic regret of $\tilde O(T^{2/3}\tilde V^{1/3}+T^{4/5})$ for large RMABs, where $T$ is the number of episodes and $\tilde V$ is the total variation distance between consecutive transition kernels. Importantly, we handle the challenging case where the variation budget is unknown in advance by combining a Bandit-over-Bandit framework with our sliding-window design. Window lengths are tuned online as a function of the estimated variation, while Whittle indices are computed via an upper-confidence-bound of the estimated transition kernels and a bilinear optimization routine. Numerical experiments demonstrate that our algorithm consistently outperforms baselines, achieving the lowest cumulative regret across a range of non-stationary environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes