LG MLMay 30, 2025

Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms

Derek Everett, Fred Lu, Edward Raff, Fernando Camacho, James Holt

arXiv:2505.24692v14.1h-index: 11KDD

Originality Highly original

AI Analysis

This work addresses the challenge of multi-armed bandits in nonstationary, high-dimensional settings, which is incremental as it combines handling nonstationarity and many arms, previously tackled separately.

The authors tackled the problem of optimizing in nonstationary environments with a large number of arms by proposing a novel policy using Gaussian interpolation, achieving O*(√T) cumulative regret and demonstrating 100-10000x faster computation while outperforming existing methods on nonstationary datasets.

Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: existing non-stationary bandit policies are designed for a small number of arms, while Lipschitz, linear, and Gaussian process bandit policies are designed to handle a large (or infinite) number of arms in stationary reward environments under constraints on the reward function. In this manuscript, we propose a novel policy to learn reward environments over a continuous space using Gaussian interpolation. We show that our method efficiently learns continuous Lipschitz reward functions with $\mathcal{O}^*(\sqrt{T})$ cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification. We finally demonstrate that our method is computationally favorable (100-10000x faster) and experimentally outperforms sliding Gaussian process policies on datasets with non-stationarity and an extremely large number of arms.

View on arXiv PDF

Similar