LGOCMLMay 11, 2023

Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits

arXiv:2305.06743v35 citations
Originality Incremental advance
AI Analysis

This work addresses a limitation in bandit algorithms for heavy-tailed data, which is incremental as it builds on prior INF methods.

The paper tackles the problem of multi-armed bandits with heavy-tailed reward distributions by proposing INF-clip, a new version of the Implicitly Normalized Forecaster algorithm, and shows it is optimal for linear settings and effective for non-linear ones, outperforming existing algorithms in ambiguous cases.

The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes