LG OC MLFeb 10, 2024

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov

arXiv:2402.07062v16.43 citationsh-index: 3AAMAS

Originality Highly original

AI Analysis

This work addresses the challenge of robust decision-making in bandit problems with heavy-tailed noise, which is incremental as it builds on existing UCB methods but introduces a novel optimization-based approach for improved performance under extreme noise conditions.

The authors tackled the problem of stochastic multi-armed bandits with heavy-tailed or super-heavy-tailed symmetric noise in rewards, proposing a new UCB-type algorithm called Clipped-SGD-UCB that achieves an O(log T √(KT log T)) regret bound, outperforming the general lower bound of O(T^(1/(1+α)) K^(α/(1+α))) for such distributions.

In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+α}} K^{\fracα{1+α}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+α}] \leq σ^{1+α}$ ($α\in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $α<0$.

View on arXiv PDF

Similar