Bandits with heavy tail
This addresses the robustness of bandit algorithms for real-world applications with heavy-tailed data, representing a significant theoretical extension beyond sub-Gaussian assumptions.
The paper tackles the stochastic multi-armed bandit problem under the weaker assumption of heavy-tailed reward distributions with moments of order 1+ε, showing that finite variance (ε=1) is sufficient to achieve regret bounds comparable to sub-Gaussian cases, with derived lower bounds indicating deterioration when ε<1.
The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+ε, for some $ε\in (0,1]$. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when ε<1.