Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
This addresses a critical gap in robust machine learning for scenarios with extreme noise, such as financial or environmental data, though it is incremental as it builds on existing bandit algorithms.
The paper tackles the problem of bandit learning with super heavy-tailed payoffs, where moments may not exist, by introducing a mean of medians estimator and a reductionist framework that achieves near-optimal regret bounds comparable to sub-Gaussian cases.
Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $η$ satisfies Pr$\left[|η| > |y|\right] \le 1/|y|^α$ for some $α> 0$. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.