LG OC MLMay 11, 2023

Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits

Yuriy Dorn, Nikita Kornilov, Nikolay Kutuzov, Alexander Nazin, Eduard Gorbunov, Alexander Gasnikov

arXiv:2305.06743v33.85 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a limitation in bandit algorithms for heavy-tailed data, which is incremental as it builds on prior INF methods.

The paper tackles the problem of multi-armed bandits with heavy-tailed reward distributions by proposing INF-clip, a new version of the Implicitly Normalized Forecaster algorithm, and shows it is optimal for linear settings and effective for non-linear ones, outperforming existing algorithms in ambiguous cases.

The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.

View on arXiv PDF Code

Similar