LG MLOct 4, 2025

Robust Batched Bandits

Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang

arXiv:2510.03798v1h-index: 1

Originality Incremental advance

AI Analysis

This addresses a gap in bandit algorithms for applications like clinical trials where rewards are heavy-tailed, offering incremental improvements over existing methods.

The paper tackles the batched multi-armed bandit problem by proposing robust algorithms for heavy-tailed reward distributions, showing that heavier tails reduce the number of batches needed for near-optimal regret in instance-independent and Lipschitz settings, while it remains unchanged in the instance-dependent setting.

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.

View on arXiv PDF

Similar